each \(\widehat{\beta}_j\): \(\triangle\widehat{Y}\) associated
with \(\triangle X_j{=}{1}\)
while holding all other \(X\) variables constant
Interpretation of Coefficients in Multiple Linear Regression Models
\(\widehat{\alpha}\) is the \(\widehat{Y}\) when \(X_j{=}{0}\)
Because there are multiple \(X\) variables, there are multiple \(\widehat{\beta}\) coefficients (one for each \(X\) variable)
Each \(\widehat{\beta}_j\) is the \(\triangle \widehat{Y}\) associated with \(\triangle X_j\)=1, while holding all other\(X\)variables constant
Interpretation of \(\widehat{\beta_1}\) When X\(_1\) Is the Treatment Variable and the Other \(X\) Variables Are All the Confounding Variables
Adding all confounders as controls in the model makes treatment and control groups comparable
As a result, we can interpret \(\widehat{\beta}_1\)using causal language
\(\widehat{\beta}_1\) is the \(\triangle \widehat{Y}\)caused by the presence of the treatment (\(\triangle X_1\)=1), while holding all confounders constant
\(\widehat{\beta}_1\) should be a valid estimate of the average treatment effect if all confounders are in the model (and assuming the linear model we are using reflects the true relationship between all the \(X\) variables and Y)
Intuitively, by adding (Z) as a control variable in the model, we statistically hold the values of (Z) constant, blocking the path shown with a dashed line
With this path blocked:
no changes in (Y) can be attributed to changes in (Z)
since the value of (Z) is being held constant, the only remaining source of change in (Y) is a change in (X)
In other words, the difference in the average outcomes between the treatment and control groups that remains after holding all confounding variables constantcan now be directly attributed to their difference with respect to the treatment (treated vs. untreated) because no other differences between the two groups are in play
Does this mean that we should add to the model as many control variables as possible? No!
For example, we should make sure to control for , which are variables affected by the treatment
Adding a post-treatment variable to the model would render our causal estimates invalid because we would be controlling for a consequence of the treatment when trying to estimate its total effect
Consider the causal diagram below
Suppose that we control for the post-treatment variable (M) when estimating the causal effect of (X) on (Y)
Doing so would block the causal path going from (X) to (Y) through (V), which is one of the ways by which changes in (X) cause changes in (Y), and therefore represents a portion of the total causal effect of (X) on (Y)
Does the Death of the Leader Increase the Level of Democracy?
Dataset on assassinations and assassination attempts against political leaders from 1875 to 2004
To begin with, let’s consider that, after an assassination attempt, the death the leader is close to random, and thus, the assassination attempts where the leader ended up dying should be, on average, comparable to those where the leader ended up surviving
If this is true, we can estimate the average treatment effect of the death of the leader by computing the difference-in-means estimator
As we saw in the last class, we can compute the difference-in-means estimator by fitting a simple linear model where \(X\) is the treatment variable
The leaders dataset
variable
description
year
year of the assassination attempt
country
name of the country where the assassination attempt took place
leadername
name of the country where the assassination attempt took place
died
whether the leader died as a result of the assassination attempt: 1=yes, 0=no
politybefore
polity scores of the country before the assassination attempt (in points, in a scale from -10 to 10)
polityafter
polity scores of the country after the assassination attempt (in points, in a scale from -10 to 10)
In-Class Exercise: What is the Effect of the Death of the Leader on the Level of Democracy?
Open RStudio
Dowload exercise_4.R from the website and open it within RStudio
Run steps 1 through 3
setwd("~/Desktop/POL269") # if Macsetwd("C:/user/Desktop/POL269") # if Windowslibrary(tidyverse) # loads tidyverse package## STEP 2: Load the dataset leaders <-read.csv("leaders.csv") # reads and stores data## STEP 3: Understand the datahead(leaders) # shows first observations
Call:
lm(formula = polityafter ~ died, data = leaders)
Coefficients:
(Intercept) died
-1.895 1.132
Fitted model: \(\widehat{\textrm{polityafter}}\) = -1.90 + 1.13 died
Interpretation of \(\widehat{\beta}\)?
definition: \(\widehat{\beta}\) is the \(\triangle \widehat{Y}\) associated with \(\triangle X\)=1
here: \(\widehat{\beta}\) = 1.13 is the \(\triangle \widehat{\textrm{polityafter}}\) associated with \(\triangle\)died=1
in words: the death of the leader (i.e., an increase in died of 1 by going from died=0 to died=1) is associated with a predicted increase in polity scores after the assassination attempt of 1.13 points, on average
unit of measurement of \(\widehat{\beta}\)? same as \(\triangle \overline{Y}\); here, Y is nonbinary and measured in points so \(\triangle \overline{Y}\) and \(\widehat{\beta}\) are measured in points
Interpretation of \(\widehat{\beta}\)?
Since here \(X\) is the treatment variable and \(Y\) is the outcome variable of interest, \(\widehat{\beta}\) is equivalent to the difference-in-means estimator so we should interpret \(\widehat{\beta}\) usingcausal langauge
Causal language: We estimate that the death of the leaderincreases polity scores after the assassination attempt by 1.13 points, on average
This should be a valid estimate of the average treatment effect if the assassination attempts where the leader died are comparable to those where the leader did not die- Is this true? Let’s see how the two groups compare to each other in terms of politybefore (a pre-treatment characteristic)
STEP 5: Identify confounding variables
Calculate the average politybefore for the two groups:
# A tibble: 2 × 2
died mean_politybefore
<int> <dbl>
1 0 -1.74
2 1 -0.704
Countries where the assassination attempt ended up being successful were, on average, slightly more democratic to begin with than countries where the assassination attempt ended up not being successful
politybefore might be a confounding variable:
STEP 6: Estimate average causal effect while controlling for confounders
To estimate the average treatment effect of the death of the leader while controlling for initial levels of democracy, we can fit the following model:
\[
\widehat{\textrm{polityafter}} = \widehat{\alpha} + \widehat{\beta}_1 died + \widehat{\beta}_2
\]
To fit the model, we use the function lm()
we specify as the main argument a formula of the type Y \(\sim\) X\(_1\) + X\(_2\)
lm(polityafter ~ died + politybefore, data=leaders)
Call:
lm(formula = polityafter ~ died + politybefore, data = leaders)
Coefficients:
(Intercept) died politybefore
-0.4346 0.2616 0.8375
definition: \(\widehat{\beta}_1\) is the \(\triangle \widehat{Y}\) associated with \(\triangle X_1\)=1, while holding all other \(X\) variables constant
here: \(\widehat{\beta}_1\) = 0.26 is the \(\triangle \widehat{polityafter}\) associated with \(\triangle\)died=1, while holding politybefore constant
in words: the death of the leader is associated with a predicted increase in polity scores after the assassination attempt of 0.26 points, on average, while holding polity scores before constant
unit of measurement of \(\widehat{\beta}_1\)? same as \(\triangle \overline{Y}\); here, Y is nonbinary and measured in points so \(\triangle \overline{Y}\) and \(\widehat{\beta}_1\) are measured in points
Interpretation of \(\widehat{\beta}_1\)?
Since X\(_1\) is the treatment variable, \(Y\) is the outcome variable, and X\(_2\) is the confounder, we can interpret \(\widehat{\beta}_1\) using causal language
Causal language: We estimate the death of the leader increases polity scores after the assassination attempt by 0.26 points, on average, when holding polity scores before the assassination attempt constant
This should be a valid estimate of the average treatment effect if politybefore is the only confounder (and assuming the linear model we are using reflects the true relationship between all \(X\) variables and Y)
Note that once we control for politybefore the effect size decreases substantially (it goes from 1.13 to 0.26)
Based on this analysis, the death of the leader increases the level of democracy of a country by a small amount
more on this later in the semester
Estimating Average Causal Effects Using Observational Data and Multiple Linear Regression Models
If, in the multiple linear regression model where \(X_{1}\) is the treatment variable, we control for all confounders by including them in the model as additional \(X\) variables, then we can interpret \(\widehat{\beta}_{1}\) as a valid estimate of the average causal effect of \(X\) on \(Y\)
This assumes, again, that
(a) we can identify and measure all confounders and
(b) the linear model we use reflects the true relationship between all the \(X\) variables and \(Y\).
Causal studies
So far we have learned how to estimate the average change in the outcome caused by the treatment
with experimental data: by computing the difference-in-means estimator directly or by fitting a simple linear regression model where X is the treatment variable (chapter 2 + chapter 5)
with observational data: by controlling for all confounding variables (chapter 5)
There are more issues we must consider when conducting or evaluating a scientific causal study, including the internal and external validity of the study
Internal Validity
Refers to the extent to which the causal assumptions are satisfied
It asks, is the estimated causal effect valid for the sample of observations in the study?
The answer depends on whether the treatment and control groups used for the estimation can be considered comparable, after statistical controls are applied (if any are). It depends on whether we have:
(a) eliminated all confounding variables by running a randomized experiment OR
(b) successfully controlled for all confounding variables when using observational data
External Validity
Refers to the extent to which the conclusions can be generalized
It asks, is the estimated causal effect valid beyond this particular study?
The answer depends on:
(i) whether the sample of observations in the study is representative of the population to which we want to generalize the results AND
(ii) whether the treatment used in the study is representative of the treatment for which we want to generalize the results
Randomized experiments tend to have strong internal validity but relatively weak external validity
random treatment assignment eliminates all potential confounding variables, BUT
sample of participants might not be representative of population and/or treatment might be unrealistic and not comparable to real-world treatments
Observational studies tend to have strong external validity but relatively weak internal validity
sample is usually representative of the population and treatment is usually realistic, BUT
possibility of uncontrolled confounding variables can’t be ruled out
This dynamic explains why scholars use both types of studies to estimate causal effects; they often have complementary strengths
Nonetheless, some studies based on experimental data have strong external validity and some studies based on observational data have strong internal validity
We should pay attention to the details of the study when evaluating a study
Randomization Gives Us Super Powers
The Role of Randomisation
When selecting observations from the population into the sample, random sampling
ensures sample is representative of target population
ensures strong external validity (assuming the treatment is realistic)
The Role of Randomisation
When deciding who receives the treatment and who doesn’t, random treatment assignment
eliminates confounders, making treatment and control groups comparable
ensures strong internal validity
Randomization Gives Us Super Powers
The ideal research design for estimating average treatment effects would make use of the two kinds of randomization we have seen
Assuming that we were also able to make the treatment as realistic as possible, this design would create a study with strong external and internal validity
For ethical, logistical, and financial reasons, few studies include both types of randomization
Let’s evaluate the causal studies we have seen thus far!
Experiment conducted in Michigan, where registered voters in Michigan were randomly assigned to either (a) receive a message designed to induce social pressure to vote or (b) receive nothing
Internal validity: Strong
Why? This is a randomized experiment. Random treatment assignment should have eliminated all confounding variables. Registered voters who received the message should be comparable to registered voters who did not
External validity: Depends on population and treatment we want to generalize the results to
Could we generalize the results to using the same message with all voting-age residents in Michigan? Yes if the sample of registered voters who were part of the study is representative of all voting-age residents in Michigan
Could we generalize the results to using the same message in Massachusetts? Depends on how different voting-age residents in Massachusetts are from the sample of registered voters in Michigan who were part of the study
Do Women Promote Different Policies than Men?
(Based on Raghabendra Chattopadhyay and Esther Duflo. 2004. “Women as Policy Makers: Evidence from a Randomized Policy Experiment in India.” Econometrica, 72 (5): 1409–43.)
Experiment conducted in India, where rural villages were randomly assigned to have a female politician
Internal validity: Strong
Why? This is a randomized experiment. Random treatment assignment should have eliminated all confounding variables. Villages that were assigned to have a female politician should be comparable to villages that were not
External validity: Depends on population and treatment we want to generalize the results to
Could we generalize the results to having a female politician across the whole of India? Not really, probably only to the rural areas the sample of rural villages in the study is representative of
Could we generalize the results to having a female politician in the U.S. towns? Absolutely not. Rural villages in India are not representative of U.S. towns
Does the Death of the Leader Increase the Level of Democracy?
(Based on Benjamin F. Jones and Benjamin A. Olken. 2009. “Hit or Miss? The Effect of Assassinations on Institutions and War.” American Economic Journal: Macroeconomics, 1 (2): 55-87.)
Observational data from assassination attempts of leaders around the world
Internal validity of study without controls: Weak
Why? This is NOT a randomized experiment and we should worry about confounding variables such as politybefore making treatment and control groups not comparable. We just learnt that countries where the assassination attempt ended up being successful were, on average, slightly more democratic to begin with than countries where the assassination attempt ended up not being successful, therefore, we know that if we do not control for politybefore in the estimation process, we end up with an invalid estimate of the average treatment effect.
Internal validity of study with controls: Stronger than without controls (but unclear whether it is strong)
Why? Controlling for politybefore in the estimation process should help make treatment and control groups more comparable after controls are applied. However, there might still be some confounders we have failed to observe and control for, therefore, we cannot say for sure that the internal validity is strong.
External validity: Depends on population and treatment we want to generalise the results to
Could we generalise the results to the death of the leader in all countries? Probably not. We should probably only generalize to countries with assassination attempts (which tend to be less democratic to begin with).
Today’s class
How to Use Multiple Linear Regression Models to Control for Confounders and Estimate Average Treatment Effects Using Observational Data