Week 8: Controlling for Confounders Using Multiple Linear Regression & Validity

POL269 Political Research

Javier Sajuria

2024-11-03

Plan for today

  • How Can We Estimate Causal Effects with Observational Data?
    • Multiple Linear Regression Models
      • Interpretation of Coefficients
      • Interpretation of \(\widehat{\beta_1}\) When X\(_1\) Is the Treatment Variable and the Other \(X\) Variables Are All the Confounding Variables
    • What is the Effect of the Death of the Leader on the Level of Democracy?
  • Internal Validity vs. External Validity
  • Randomized Experiments vs. Observational Studies
  • The Role of Randomization

How Can We Estimate Causal Effects with Observational Data?

  • We cannot rely on random treatment assignment to eliminate potential confounders and make treatment and control groups comparable
  • First, we need to identify and measure all confounders + variables that affect both
    • (i) the likelihood of receiving the treatment and
    • (ii) the outcome

Multiple Linear Regression Models

\[ \widehat{Y}_i = \widehat{\alpha} + \widehat{\beta}_1 X_{i1}+... + \widehat{\beta}_p X_{ip} \]

where:

  • \(\widehat{Y_i}\) is the predicted value of \(Y\) for observation \(i\)

  • \(\widehat{\alpha}\) is the estimated intercept coefficient

  • each \(\widehat{\beta}_j\) (pronounced beta hat sub j) is the estimated coefficient for variable \(X_j\) (\(j{=} {1}, ..., p\))

  • each \(X_{ij}\) is the observed value of the variable \(X_j\) for observation \(i\) (\(j{=} {1}, ..., p\))

  • \(p\) is the total number of \(X\) variables in the model.

single regression

\(\widehat{Y} = \widehat{\alpha} + \widehat{\beta} X\)

multiple regression

\(\widehat{Y} = \widehat{\alpha} + \widehat{\beta}_{1} X_{1}+ ... + \widehat{\beta}_p X_{p}\)

\(\widehat{\alpha}\): \(\widehat{Y}\) when \(X{=}{0}\)

\(\widehat{\alpha}\): \(\widehat{Y}\) when all \(X_j{=}{0}\)

(\(j{=}{1},...,p\))

\(\widehat{\beta}\): \(\triangle\widehat{Y}\) associated

with \(\triangle X{=}{1}\)

each \(\widehat{\beta}_j\): \(\triangle\widehat{Y}\) associated

with \(\triangle X_j{=}{1}\)

while holding all other \(X\) variables constant

Interpretation of Coefficients in Multiple Linear Regression Models

  • \(\widehat{\alpha}\) is the \(\widehat{Y}\) when \(X_j{=}{0}\)
  • Because there are multiple \(X\) variables, there are multiple \(\widehat{\beta}\) coefficients (one for each \(X\) variable)
  • Each \(\widehat{\beta}_j\) is the \(\triangle \widehat{Y}\) associated with \(\triangle X_j\)=1, while holding all other \(X\) variables constant

Interpretation of \(\widehat{\beta_1}\) When X\(_1\) Is the Treatment Variable and the Other \(X\) Variables Are All the Confounding Variables

  • Adding all confounders as controls in the model makes treatment and control groups comparable
  • As a result, we can interpret \(\widehat{\beta}_1\)using causal language
  • \(\widehat{\beta}_1\) is the \(\triangle \widehat{Y}\) caused by the presence of the treatment (\(\triangle X_1\)=1), while holding all confounders constant
  • \(\widehat{\beta}_1\) should be a valid estimate of the average treatment effect if all confounders are in the model (and assuming the linear model we are using reflects the true relationship between all the \(X\) variables and Y)

Intuitively, by adding (Z) as a control variable in the model, we statistically hold the values of (Z) constant, blocking the path shown with a dashed line

  • With this path blocked:
    • no changes in (Y) can be attributed to changes in (Z)
    • since the value of (Z) is being held constant, the only remaining source of change in (Y) is a change in (X)

In other words, the difference in the average outcomes between the treatment and control groups that remains after holding all confounding variables constantcan now be directly attributed to their difference with respect to the treatment (treated vs. untreated) because no other differences between the two groups are in play

  • Does this mean that we should add to the model as many control variables as possible? No!
    • For example, we should make sure to control for , which are variables affected by the treatment
    • Adding a post-treatment variable to the model would render our causal estimates invalid because we would be controlling for a consequence of the treatment when trying to estimate its total effect

  • Consider the causal diagram below
  • Suppose that we control for the post-treatment variable (M) when estimating the causal effect of (X) on (Y)
  • Doing so would block the causal path going from (X) to (Y) through (V), which is one of the ways by which changes in (X) cause changes in (Y), and therefore represents a portion of the total causal effect of (X) on (Y)

Does the Death of the Leader Increase the Level of Democracy?

Based on Benjamin F. Jones and Benjamin A. Olken. 2009. “Hit or Miss? The Effect of Assassinations on Institutions and War.” American Economic Journal: Macroeconomics, 1 (2): 55-87.

  • We will answer, by analysing observational data
  • Dataset on assassinations and assassination attempts against political leaders from 1875 to 2004
  • To begin with, let’s consider that, after an assassination attempt, the death the leader is close to random, and thus, the assassination attempts where the leader ended up dying should be, on average, comparable to those where the leader ended up surviving
  • If this is true, we can estimate the average treatment effect of the death of the leader by computing the difference-in-means estimator
  • As we saw in the last class, we can compute the difference-in-means estimator by fitting a simple linear model where \(X\) is the treatment variable

The leaders dataset

variable description
year year of the assassination attempt
country name of the country where the assassination attempt took place
leadername name of the country where the assassination attempt took place
died whether the leader died as a result of the assassination attempt: 1=yes, 0=no
politybefore polity scores of the country before the assassination attempt (in points, in a scale from -10 to 10)
polityafter polity scores of the country after the assassination attempt (in points, in a scale from -10 to 10)

In-Class Exercise: What is the Effect of the Death of the Leader on the Level of Democracy?

  1. Open RStudio
  2. Dowload exercise_4.R from the website and open it within RStudio
  3. Run steps 1 through 3

setwd("~/Desktop/POL269") # if Mac
setwd("C:/user/Desktop/POL269") # if Windows
library(tidyverse) # loads tidyverse package

## STEP 2: Load the dataset 
leaders <- read.csv("leaders.csv") # reads and stores data

## STEP 3: Understand the data
head(leaders) # shows first observations
  year     country       leadername died politybefore polityafter
1 1929 Afghanistan Habibullah Ghazi    0           -6   -6.000000
2 1933 Afghanistan       Nadir Shah    1           -6   -7.333333
3 1934 Afghanistan      Hashim Khan    0           -6   -8.000000
4 1924     Albania             Zogu    0            0   -9.000000
5 1931     Albania             Zogu    0           -9   -9.000000
6 1968     Algeria      Boumedienne    0           -9   -9.000000
  • the treatment variable (X) is died
  • the outcome variable (Y) is polityafter

STEP 4: Compute difference-in-means estimator

 lm(polityafter ~ died, data=leaders)  

Call:
lm(formula = polityafter ~ died, data = leaders)

Coefficients:
(Intercept)         died  
     -1.895        1.132  
  • Fitted model: \(\widehat{\textrm{polityafter}}\) = -1.90 + 1.13 died

  • Interpretation of \(\widehat{\beta}\)?
    • definition: \(\widehat{\beta}\) is the \(\triangle \widehat{Y}\) associated with \(\triangle X\)=1
    • here: \(\widehat{\beta}\) = 1.13 is the \(\triangle \widehat{\textrm{polityafter}}\) associated with \(\triangle\)died=1
    • in words: the death of the leader (i.e., an increase in died of 1 by going from died=0 to died=1) is associated with a predicted increase in polity scores after the assassination attempt of 1.13 points, on average
  • unit of measurement of \(\widehat{\beta}\)? same as \(\triangle \overline{Y}\); here, Y is nonbinary and measured in points so \(\triangle \overline{Y}\) and \(\widehat{\beta}\) are measured in points

  • Interpretation of \(\widehat{\beta}\)?
  • Since here \(X\) is the treatment variable and \(Y\) is the outcome variable of interest, \(\widehat{\beta}\) is equivalent to the difference-in-means estimator so we should interpret \(\widehat{\beta}\) using causal langauge
  • Causal language: We estimate that the death of the leader increases polity scores after the assassination attempt by 1.13 points, on average
  • This should be a valid estimate of the average treatment effect if the assassination attempts where the leader died are comparable to those where the leader did not die- Is this true? Let’s see how the two groups compare to each other in terms of politybefore (a pre-treatment characteristic)

STEP 5: Identify confounding variables

  • Calculate the average politybefore for the two groups:
leaders |> group_by(died) |> summarise(mean_politybefore = mean(politybefore))
# A tibble: 2 × 2
   died mean_politybefore
  <int>             <dbl>
1     0            -1.74 
2     1            -0.704
  • Countries where the assassination attempt ended up being successful were, on average, slightly more democratic to begin with than countries where the assassination attempt ended up not being successful

  • politybefore might be a confounding variable:

STEP 6: Estimate average causal effect while controlling for confounders

  • To estimate the average treatment effect of the death of the leader while controlling for initial levels of democracy, we can fit the following model:

\[ \widehat{\textrm{polityafter}} = \widehat{\alpha} + \widehat{\beta}_1 died + \widehat{\beta}_2 \]

  • To fit the model, we use the function lm()

    • we specify as the main argument a formula of the type Y \(\sim\) X\(_1\) + X\(_2\)

lm(polityafter ~ died + politybefore, data=leaders) 

Call:
lm(formula = polityafter ~ died + politybefore, data = leaders)

Coefficients:
 (Intercept)          died  politybefore  
     -0.4346        0.2616        0.8375  
  • Fitted model:

\[ \widehat{\textrm{polityafter}} = -0.43 + 0.26died + 0.84 politybefore \]

  • Interpretation of \(\widehat{\beta}_1\)?
    • definition: \(\widehat{\beta}_1\) is the \(\triangle \widehat{Y}\) associated with \(\triangle X_1\)=1, while holding all other \(X\) variables constant
    • here: \(\widehat{\beta}_1\) = 0.26 is the \(\triangle \widehat{polityafter}\) associated with \(\triangle\)died=1, while holding politybefore constant
    • in words: the death of the leader is associated with a predicted increase in polity scores after the assassination attempt of 0.26 points, on average, while holding polity scores before constant
  • unit of measurement of \(\widehat{\beta}_1\)? same as \(\triangle \overline{Y}\); here, Y is nonbinary and measured in points so \(\triangle \overline{Y}\) and \(\widehat{\beta}_1\) are measured in points

  • Interpretation of \(\widehat{\beta}_1\)?
  • Since X\(_1\) is the treatment variable, \(Y\) is the outcome variable, and X\(_2\) is the confounder, we can interpret \(\widehat{\beta}_1\) using causal language
  • Causal language: We estimate the death of the leader increases polity scores after the assassination attempt by 0.26 points, on average, when holding polity scores before the assassination attempt constant
  • This should be a valid estimate of the average treatment effect if politybefore is the only confounder (and assuming the linear model we are using reflects the true relationship between all \(X\) variables and Y)

  • Note that once we control for politybefore the effect size decreases substantially (it goes from 1.13 to 0.26)
  • Based on this analysis, the death of the leader increases the level of democracy of a country by a small amount
    • more on this later in the semester

Estimating Average Causal Effects Using Observational Data and Multiple Linear Regression Models

If, in the multiple linear regression model where \(X_{1}\) is the treatment variable, we control for all confounders by including them in the model as additional \(X\) variables, then we can interpret \(\widehat{\beta}_{1}\) as a valid estimate of the average causal effect of \(X\) on \(Y\)

This assumes, again, that

  • (a) we can identify and measure all confounders and

  • (b) the linear model we use reflects the true relationship between all the \(X\) variables and \(Y\).

Causal studies

  • So far we have learned how to estimate the average change in the outcome caused by the treatment
    • with experimental data: by computing the difference-in-means estimator directly or by fitting a simple linear regression model where X is the treatment variable (chapter 2 + chapter 5)
    • with observational data: by controlling for all confounding variables (chapter 5)
  • There are more issues we must consider when conducting or evaluating a scientific causal study, including the internal and external validity of the study

Internal Validity

  • Refers to the extent to which the causal assumptions are satisfied
  • It asks, is the estimated causal effect valid for the sample of observations in the study?
  • The answer depends on whether the treatment and control groups used for the estimation can be considered comparable, after statistical controls are applied (if any are). It depends on whether we have:
    • (a) eliminated all confounding variables by running a randomized experiment OR
    • (b) successfully controlled for all confounding variables when using observational data

External Validity

  • Refers to the extent to which the conclusions can be generalized
  • It asks, is the estimated causal effect valid beyond this particular study?
  • The answer depends on:
    • (i) whether the sample of observations in the study is representative of the population to which we want to generalize the results AND
    • (ii) whether the treatment used in the study is representative of the treatment for which we want to generalize the results

  • Randomized experiments tend to have strong internal validity but relatively weak external validity
    • random treatment assignment eliminates all potential confounding variables, BUT
    • sample of participants might not be representative of population and/or treatment might be unrealistic and not comparable to real-world treatments
  • Observational studies tend to have strong external validity but relatively weak internal validity
    • sample is usually representative of the population and treatment is usually realistic, BUT
    • possibility of uncontrolled confounding variables can’t be ruled out

  • This dynamic explains why scholars use both types of studies to estimate causal effects; they often have complementary strengths
  • Nonetheless, some studies based on experimental data have strong external validity and some studies based on observational data have strong internal validity
  • We should pay attention to the details of the study when evaluating a study

Randomization Gives Us Super Powers

The Role of Randomisation

  1. When selecting observations from the population into the sample, random sampling
  • ensures sample is representative of target population

  • ensures strong external validity (assuming the treatment is realistic)

The Role of Randomisation

  1. When deciding who receives the treatment and who doesn’t, random treatment assignment
  • eliminates confounders, making treatment and control groups comparable

  • ensures strong internal validity

Randomization Gives Us Super Powers

  • The ideal research design for estimating average treatment effects would make use of the two kinds of randomization we have seen
  • Assuming that we were also able to make the treatment as realistic as possible, this design would create a study with strong external and internal validity
  • For ethical, logistical, and financial reasons, few studies include both types of randomization
  • Let’s evaluate the causal studies we have seen thus far!

Does Social Pressure Affect Turnout?

(Based on Alan S. Gerber, Donald P. Green, and Christopher W. Larimer. 2008. ``Social Pressure and Voter Turnout: Evidence from a Large-Scale Field Experiment.” American Political Science Review, 102 (1): 33-48.)

  • Experiment conducted in Michigan, where registered voters in Michigan were randomly assigned to either (a) receive a message designed to induce social pressure to vote or (b) receive nothing
  • Internal validity: Strong
    • Why? This is a randomized experiment. Random treatment assignment should have eliminated all confounding variables. Registered voters who received the message should be comparable to registered voters who did not

  • External validity: Depends on population and treatment we want to generalize the results to
    • Could we generalize the results to using the same message with all voting-age residents in Michigan? Yes if the sample of registered voters who were part of the study is representative of all voting-age residents in Michigan
    • Could we generalize the results to using the same message in Massachusetts? Depends on how different voting-age residents in Massachusetts are from the sample of registered voters in Michigan who were part of the study

Do Women Promote Different Policies than Men?

(Based on Raghabendra Chattopadhyay and Esther Duflo. 2004. “Women as Policy Makers: Evidence from a Randomized Policy Experiment in India.” Econometrica, 72 (5): 1409–43.)

  • Experiment conducted in India, where rural villages were randomly assigned to have a female politician
  • Internal validity: Strong
    • Why? This is a randomized experiment. Random treatment assignment should have eliminated all confounding variables. Villages that were assigned to have a female politician should be comparable to villages that were not

  • External validity: Depends on population and treatment we want to generalize the results to
  • Could we generalize the results to having a female politician across the whole of India? Not really, probably only to the rural areas the sample of rural villages in the study is representative of
    • Could we generalize the results to having a female politician in the U.S. towns? Absolutely not. Rural villages in India are not representative of U.S. towns

Does the Death of the Leader Increase the Level of Democracy?

(Based on Benjamin F. Jones and Benjamin A. Olken. 2009. “Hit or Miss? The Effect of Assassinations on Institutions and War.” American Economic Journal: Macroeconomics, 1 (2): 55-87.)

  • Observational data from assassination attempts of leaders around the world
  • Internal validity of study without controls: Weak
    • Why? This is NOT a randomized experiment and we should worry about confounding variables such as politybefore making treatment and control groups not comparable. We just learnt that countries where the assassination attempt ended up being successful were, on average, slightly more democratic to begin with than countries where the assassination attempt ended up not being successful, therefore, we know that if we do not control for politybefore in the estimation process, we end up with an invalid estimate of the average treatment effect.

  • Internal validity of study with controls: Stronger than without controls (but unclear whether it is strong)
    • Why? Controlling for politybefore in the estimation process should help make treatment and control groups more comparable after controls are applied. However, there might still be some confounders we have failed to observe and control for, therefore, we cannot say for sure that the internal validity is strong.

  • External validity: Depends on population and treatment we want to generalise the results to
    • Could we generalise the results to the death of the leader in all countries? Probably not. We should probably only generalize to countries with assassination attempts (which tend to be less democratic to begin with).

Today’s class

  • How to Use Multiple Linear Regression Models to Control for Confounders and Estimate Average Treatment Effects Using Observational Data

  • Internal vs. External Validity