11.3.2 primary_06 ~ treatment + sex
In this section, we will look at the relationship between primary voting and treatment + sex.
The math:
Without variable names:
\[ y_{i} = \beta_{0} + \beta_{1}x_{i, 1} + \beta_{2}x_{i,2} ... + \beta_{n}x_{i,n} + \epsilon_{i} \] With variable names:
\[ y_{i} = \beta_{0} + \beta_{1}civic\_duty_i + \beta_{2}hawthorne_i + \beta_{3}self_i + \beta_{4}neighbors_i + \beta_{5}male_i + \epsilon_{i} \]
There are two ways to formalize the model used in fit_1
: with and without the variable names. The former is related to the concept of Justice as we acknowledge that the model is constructed via the linear sum of n
parameters times the value for n
variables, along with an error term. In other words, it is a linear model. The only other model we have learned this semester is a logistic model, but there are other kinds of models, each defined by the mathematics and the assumptions about the error term.
The second type of formal notation, more associated with the virtue Courage, includes the actual variable names we are using. The trickiest part is the transformation of character/factor variables into indicator variables, meaning variables with 0/1 values. Because treatment
has 5 levels, we need 4 indicator variables. The fifth level — which, by default, is the first variable alphabetically (for character variables) or the first level (for factor variables) — is incorporated in the intercept.
Let’s translate the model into code.
fit_1 < stan_glm(data = object_1,
formula = primary_06 ~ treatment + sex,
refresh = 0,
seed = 987)
print(fit_1, digits = 3)
## stan_glm
## family: gaussian [identity]
## formula: primary_06 ~ treatment + sex
## observations: 344084
## predictors: 6
## 
## Median MAD_SD
## (Intercept) 0.291 0.001
## treatmentCivic Duty 0.018 0.003
## treatmentHawthorne 0.026 0.003
## treatmentSelf 0.048 0.002
## treatmentNeighbors 0.081 0.003
## sexMale 0.012 0.002
##
## Auxiliary parameter(s):
## Median MAD_SD
## sigma 0.464 0.001
##
## 
## * For help interpreting the printed output see ?print.stanreg
## * For info on the priors used see ?prior_summary.stanreg
We will now create a table that nicely formats the results of fit_1
using the tbl_regression()
function from the gtsummary package. It will also display the associated 95% confidence interval for each coefficient.
tbl_regression(fit_1,
intercept = TRUE,
estimate_fun = function(x) style_sigfig(x, digits = 3)) %>%
# Using Beta as the name of the parameter column is weird.
as_gt() %>%
tab_header(title = md("**Likelihood of Voting in the Next Election**"),
subtitle = "How Treatment Assignment and Age Predict Likelihood of Voting") %>%
tab_source_note(md("Source: Gerber, Green, and Larimer (2008)")) %>%
cols_label(estimate = md("**Parameter**"))
Likelihood of Voting in the Next Election  

How Treatment Assignment and Age Predict Likelihood of Voting  
Characteristic  Parameter  95% CI^{1} 
(Intercept)  0.291  0.288, 0.293 
treatment  
Control  —  — 
Civic Duty  0.018  0.013, 0.023 
Hawthorne  0.026  0.021, 0.031 
Self  0.048  0.044, 0.054 
Neighbors  0.081  0.076, 0.087 
sex  
Female  —  — 
Male  0.012  0.009, 0.015 
Source: Gerber, Green, and Larimer (2008)  
^{1
}
CI = Confidence Interval

Interpretation:
* The intercept of this model is the expected value of the probability of someone voting in the 2006 primary given that they are part of the control group and are female. In this case, we estimate that women in the control group will vote ~29.1% of the time.
* The coefficient for sexMale
indicates the difference in likelihood of voting between a male and female. In other words, when comparing men and women, the 0.01 implies that men are ~1.2% more likely to vote than women. Note that, because this is a linear model with no interactions between sex and other variables, this difference applies to any male, regardless of the treatment he received. Because sex can not be manipulated (by assumption), we should not use a causal interpretation of the coefficient.
* The coefficients of the treatments, on the other hand, do have a causal interpretation. For a single individual, of either sex, being sent the Self postcard increases your probability of voting by 4.8%. It appears that the Neighbors
treatment is the most effective at ~8.1% and Civic Duty
is the least effective at ~1.8%.