Week 3: Estimating Causal Effects with Randomized Experiments

POL269 Political Data Research

Javier Sajuria

05.02.2024

Why do we analyse data?

MEASURE:To infer population characteristics via survey research

  • what proportion of constituents support a particular policy?

PREDICT:To make predictions

  • who is the most likely candidate to win an upcoming election?

EXPLAIN:To estimate the causal effect of a treatment on an outcome

  • what is the effect of small classrooms on student performance?

  • We will progress from simple to more complex methods
  • We begin with EXPLAIN by learning how to estimate causal effects with randomized experiments
    • involves relatively simple maths
  • Then, we will learn how to MEASURE the characteristics of an entire population from a sample of survey respondents
    • visualizations, descriptive statistics, correlation
  • Then, we will learn how to PREDICT outcome variables
    • simple linear regression
  • Then, we will return to EXPLAIN and estimate causal effects with observational data
    • multiple linear regression

Plan for today

  • Causal Effects
  • Treatment and Outcome Variables
  • Individual Causal Effects
  • Average Causal Effects
  • Randomized Experiments
  • Difference-in-Means Estimator
  • Percentage points
  • Practical exercise

Does Social Pressure Affect Turnout?

Based on Alan S. Gerber, Donald P. Green, and Christopher W. Larimer. 2008. “Social Pressure and Voter Turnout: Evidence from a Large-Scale Field Experiment.” American Political Science Review, 102 (1): 33-48.

  • To answer, we will analyze data from a randomized experiment where registered voters in Michigan were randomly assigned to either
    • (a) receive a message designed to induce social pressure to vote, or
    • (b) receive nothing
  • The message told registered voters that after the election their neighbors would be informed about whether they voted in the election or not

Dear Registered Voter: WHAT IF YOUR NEIGHBORS KNEW WHETHER YOU VOTED? … We’re sending this mailing to you and your neighbours to publicize who does and does not vote. The chart shows the names of some of your neighbours, showing which have voted in the past. After the August 8 election, we intend to mail an updated chart. You and your neighbours will all know who voted and who did not. DO YOUR CIVIC DUTY–VOTE!

MAPEL DR Name Aug 2004 Nov 2004 Aug 2006
9993 JOSEPH JAMES SMITH Voted Voted ??
9995 JENNIFER KAY SMITH Didn’t vote Voted ??
9997 RICHARD B JACKSON Didn’t vote Voted ??
9999 KATHY MARIE JACKSON Didn’t vote Voted ??

The voting dataset

Unit of observation: Registered voters

Variables:

Variable Description
birth year of birth
message whether registered voter received the message (“yes” or “no”)
voted whether registered voter voted: 1= yes; 0=no

Causal Effects

Many of the most important research questions in politics involve estimating a causal effect:

  • Does foreign aid promote democratic government?
  • Do women promote different policies than men?
  • Do small classes improve student performance?
  • Does social pressure increase the probability of turning out to vote?

Causal Effects refer to the cause-and-effect connection between two variables:

  • treatment variable (X):variable whose change may produce a change in the outcome variable

  • outcome variable (Y):variable that may change as a result of a change in the treatment variable

The causal relationship we are interested is

\[ X \rightarrow Y \]

  • In the voting dataset we have three variables, birth, message, and voted, and we aim to answer the research question: “Does social pressure increase the probability of turning out to vote?”

  • What is the treatment variable?

    • message: indicates whether register voter received the message inducing social pressure
  • What is the outcome variable?

    • voted: indicates whether register voter voted
  • The causal relationship we are interested in is:

\[ message \rightarrow voted \]

Treatment variables

In this class, treatment variables will always be binary:

\[ \textrm{X}_i = \begin{cases} \textrm{1} \text{ if individual i takes the treatment} \\ \textrm{0} \text{ if inidividual i does not take the treatment}\end{cases} \]

In the voting experiment, the treatment variable is:

\[ \textrm{message}_i = \begin{cases} \textrm{1} \text{ if registered voter i received message} \\ \textrm{0} \text{ if registered voter i did not}\end{cases} \]

Based on whether the individual receives the treatment, we speak of two different conditions

  • treatment is the condition with the treatment: \(X_i{=}\textrm{1}\)

  • control is the condition without the treatment: \(X_i{=}\textrm{0}\)

Outcome variables

We will see different types of outcome variables

  • binary

  • non-binary

In the voting experiment, the outcome variable is:

\[ \textrm{voted}_i = \begin{cases} \textrm{1} \text{ if registered voter i voted}\\ \textrm{0} \text{ if registered voter i didn't vote}\end{cases} \]

what type of variable is voted?

Individual causal effects

The causal effect of X on Y is the change in the outcome variable caused by a change in the treatment variable

  • Ideally, we would like to compare two potential outcomes:
    • outcome when the treatment is present: \(Y_i(X_i=\textrm{1})\)
    • outcome when the treatment is absent: \(Y_i(X_i=\textrm{0})\)
  • If we could observe both potential outcomes for each individual \(i\), the individual causal effect would be:

\[ \triangle Y_i = Y_i(X_i{=}\textrm{1}) - Y_i(X_i{=}\textrm{0}) \]

  • \(\triangle Y_i\) represents the change in \(Y\) for individual \(i\)

  • In the voting experiment, we aim to measure the extent to which the probability of voting changes as a result of receiving the social pressure message

  • Ideally, for each registered voter we would like to observe:

    • whether they voted after receiving the social pressure message: voted\(_i\)(message\(_i\)=1)

      • whether they voter after NOT receiving the social pressure message: voted\(_i\)(message\(_i\)=0)
  • If this were possible, the effect of receiving the social pressure message on the probability of voting would be:

\[ \triangle \textrm{voted}_i = \textrm{voted}_i (\textrm{message}_i = \textrm{1}) - \textrm{voted}_i(\textrm{message}_i = \textrm{0}) \]

  • should be interpreted as an increase if positive, a decrease if negative, and as no effect if zero

Do we ever observe both potential outcomes for the same individual at the exact same time under the same circumstances?

  • We only observe the factual outcome: potential outcome under the condition received in reality

  • We can never observe the counterfactual outcome: potential outcome under the opposite condition as the one received in reality

  • Fundamental problem of causal inference: We can never observe the counterfactual outcome
  • As a result, we cannot measure causal effects at the individual level

Average causal effects

  • To get around the fundamental problem of causal inference, we must find good approximations for the counterfactual outcomes

  • We move away from individual-level effects and focus on the average causal effects across a group of individuals

  • The average causal effect of the treatment X on the outcome Y (also known as the average treatment effect) is the average of all the individual causal effects of X on Y within a group

    • It is the average change in Y caused by a change in X for a group of individuals

  • How can we obtain good approximations for the counterfactual outcomes?

    • We must find or create a situation in which the observations treated and the observations untreated are, at the aggregate level, similar with respect to all the variables that might affect the outcome other than the treatment variable itself

    • Then, we can use the factual outcome of one group as a proxy for the counterfactual outcome of the other

  • The best way to accomplish this is by conducting a randomised experiment

Randomised experiments

  • A randomised experiment is a type of study design in which treatment assignment is randomized

    • researchers decide who takes the treatment based on a random process such as the flip of a coin
  • Once treatment is administered, we differentiate between:

    • treatment group: observations that received the treatment
    • control group: observations that didn’t receive the treatment
  • In the voting experiment, what are the treatment and control groups?

Random treatment assignment makes the treatment and control groups on average identical to each other in all observed and unobserved pre-treatment characteristics

  • When treatment assignment is randomised, the only thing that distinguishes the treatment group from the control group, besides the treatment itself, is chance

    • although the treatment and control groups consist of different individuals, the two groups are, as a whole, comparable to each other in terms of their pre-treatment characteristics (characteristics before treatment was administered)

  • If the treatment and control groups are comparable before the treatment is administered

    • we can use the factual outcome of one group as a proxy for the counterfactual outcome of the other

    • we can estimate the average treatment effect by calculating the difference-in-means estimator

\[ \bar{Y}_\text{treatment group} - \bar{Y}_\text{control group} \]

\(\bar{Y}_\text{treatment group}\): average outcome for the treatment group

\(\bar{Y}_\text{control group}\): average outcome for the control group

  • Only when the treatment and control groups are comparable does the diffs-in-means estimator produce a valid estimate of the average treatment effect

    • \(\widehat{\textrm{average_effect}} = \bar{Y}_\text{treatment group} - \bar{Y}_\text{control group}\)

    • “hat” on top of the name denotes this is an estimate

  • In the voting experiment, since treatment was randomly assigned, we can assume that the treatment and control groups are comparable and, thus, can estimate the average causal effect of receiving the message on the probability of voting by using the diffs-in-means estimator:

\[ \overline{\textrm{voted}}_\text{treatment group} - \overline{\textrm{voted}}_\text{control group} \]

  • \(\overline{\textrm{voted}}_\text{treatment group}\): proportion of registered voters who voted among those who received the message

  • \(\overline{\textrm{voted}}_\text{control group}\): proportion of registered voters who voted among those who did not receive the message

    • why proportions and not averages? because voted is binary so the average of voted should be interpreted as a proportion, not an average

How to Run an Experiment

Random Treatment Assignment Makes Treatment and Control Groups Comparable When Sample Size is Large Enough (Link to interactive graph)

What is a percentage point?

  • It is the unit of measurement for the arithmetic difference between two percentages:

\[ \% - \% = p.p. \]

  • Example: if a candidate’s vote share increases from 50% to 60%, we would state that the vote share increased by …

\[ \triangle\textrm{vshare} = \textrm{vshare}_{\textrm{final}} - \textrm{vshare}_{\textrm{initial}} = \textrm{60%} - \textrm{50%} = \textrm{10 p.p.} \]

  • Why not 10%? If someone told us that an initial vote share was 50% and that it increased by 10%, the final vote share would be _________ (instead of 60%)

    • What is 10% of 50%? \(\textrm{0.10}{\times}\textrm{50}=\textrm{5 p.p.}\)

\[ \textrm{vshare}_{\textrm{final}} = \textrm{vshare}_{\textrm{initial}} + \triangle \textrm{vshare} = \textrm{50%} + \textrm{5 p.p.} = \textrm{55%} \]

REVIEW: Unit of Measurement of Means

Unit of Measurement of the Diffs-in-Means Estimator

  • Formula of the difference-in-means estimator in words?
    • Average outcome for the treatment group - Average outcome for the control group
  • If the outcome variable is binary, in what unit of measurement will the average outcomes be?
    • percentages (after multiplying the decimal by 100)
  • What will be the unit of measurement of estimator?
    • percentage points (% - % = p.p.)
  • Do we need to multiply the result by 100?
    • yes!

Unit of Measurement of the Diffs-in-Means Estimator

Does Social Pressure Affect Turnout?

(Based on Alan S. Gerber, Donald P. Green, and Christopher W. Larimer. 2008. ``Social Pressure and Voter Turnout: Evidence from a Large-Scale Field Experiment.” American Political Science Review, 102 (1): 33-48.)

Does Social Pressure Affect Turnout?

  • We will answer, by analysing data from an experiment where registered voters were randomly assigned to either
    • (a) receive a message designed to induce social pressure to vote, or
    • (b) receive nothing
  • The message told registered voters that after the election their neighbours would be informed about whether they voted in the election or not

  • What do we need to calculate to estimate the average causal effect of receiving the message on the probability of turning out to vote?
    • the difference-in-means estimator
  • Why does the difference-in-means estimator provide us with a valid estimate of the average treatment effect?
    • because the data come from a randomised experiment (where treatment was randomly assigned)
    • as a result, treatment and control groups are comparable and we can use the factual outcome of one group as a proxy for the counterfactual outcome of the other

  • In this case, the difference-in-means estimator is:
  • Answer: \[ \overline{\textrm{voted}}_\text{treatment group} - \overline{\textrm{voted}}_\text{control group} \]
  • \(\overline{\textrm{voted}}_\text{treatment group}\): proportion of registered voters who voted among those who received the message
  • \(\overline{\textrm{voted}}_\text{control group}\): proportion of registered voters who voted among those who did not receive the message

In-class exercise

  1. Open RStudio (RStudio will open R)
  2. Download exercise_2.R and voting.csv from the website (pol269.sajuria.com/datasets)
  3. Open exercise_2.R from within RStudio +
    1. RStudio: File >> Open File
  4. Run code from steps 1-3
    1. use the setwd() for your computer

## STEP 1. Set the working directory
setwd("~/Desktop/POL269") # example for Mac 
setwd("C:/user/Desktop/POL269") # example for Windows
## STEP 2. Load the dataset
voting <- read.csv("voting.csv") # reads and stores data
## STEP 3. Look at the data
head(voting) # shows the first six observations
##   birth message voted
## 1  1981      no     0
## 2  1959      no     1
## 3  1956      no     1
## 4  1939     yes     1
## 5  1968      no     0
## 6  1967      no     0
## what does each observation represent?
## what is the outcome variable?
## what is the treatment variable?

Step 4. Create binary treatment variable

OPTION 1: ifelse()

  • First, we need to learn how to use == and ifelse()

  • The operator ==

    • is used to create logical tests that evaluate whether the observations of a variable equal a particular value (the particular values should be in quotes if text but without quotes if numbers)

    • examples:

      • data$variable == 1

      • data$variable == "yes"

  • The function ifelse()
    • creates the contents of a new variable based on the values of an existing one

    • requires three arguments, separated by commas, in the following order:

      (1) logical test (using ==)

      (2) return value if logical test is true,

      (3) return value if logical test is false

    • example: ifelse(data$variable == "yes", 1, 0)

## STEP 4. Create binary treatment variable using ifelese()
## create variable pressure inside dataframe voting
voting$pressure <- # stores return values in new variable
  ifelse(voting$message=="yes", # logical test
         1, # return value if logical test is true
         0) # return value if logical test is false
  • You need to run the code all at once (not line by line)

  • Remember that R will ignore anything that follows the # sign, until the end of the line

  • What would have happened had we not added voting in front of pressure on the first line of code above?

  • Whenever we create a new variable, we should make sure it was created correctly by looking at the first few observations of the dataframe again

    head(voting) # shows first observations
    ##   birth message voted pressure
    ## 1  1981      no     0        0
    ## 2  1959      no     1        0
    ## 3  1956      no     1        0
    ## 4  1939     yes     1        1
    ## 5  1968      no     0        0
    ## 6  1967      no     0        0
  • Note that when message equals “yes”, pressure equals 1; and when message equals “no”, pressure equals 0

OPTION 2: case_when()

  • This is the tidyverse option, and uses piping (%>%)

  • It uses a similar structure than ifelse(), but it requires to specify the default option using the TRUE parameter.

    • example:

      data <- data %>% case_when( variable == "yes" ~ 1, TRUE ~ 0)

library(tidyverse)          # Load the tidyverse package

voting <- voting %>%        # Store the result 
  mutate(                   # Use mutate() to create a new variable
    pressure2 = case_when(  # nbew variable is called pressure2
      message == "yes" ~ 1,
      TRUE ~ 0              # if the above condition is not true 
    )
  )
  • This option is slightly longer, but it remains consistent with the use of tidyverse

  • You can choose which option to use, the results are basically the same.

head(voting)
  birth message voted pressure pressure2
1  1981      no     0        0         0
2  1959      no     1        0         0
3  1956      no     1        0         0
4  1939     yes     1        1         1
5  1968      no     0        0         0
6  1967      no     0        0         0
  • Note that pressure and pressure2 are the same.

STEP 5. Compute the difference-in-means estimator

\[ \overline{Y}_\text{treatment group} - \overline{Y}_\text{control group} \]

\(\overline{Y}_\text{treatment group}\): average outcome for the treatment group

\(\overline{Y}_\text{control group}\): average outcome for the control group

  • In the voting experiment, since is the outcome variable, the difference-in-means estimator is:

\[ \overline{\textrm{voted}}_\text{treatment group} - \overline{\textrm{voted}}_\text{control group} \]

  • Let’s start by practicing computing and interpreting means
mean(voting$voted)
[1] 0.3101759
# OR

voting %>% summarise(mean = mean(voted))
       mean
1 0.3101759
  • Interpretation?
    • 31% of all the registered voters who were part of the experiment voted
  • Why in %?
    • Because voted is binary
    • Recall: The mean of a binary variable should be interpreted in % (after multiplying the output by 100)

  • mean(voting$voted) or summarise(mean = mean(voted)) compute the mean of voted for ALL the observations in the dataset
  • To compute the difference-in-means estimator, we need to calculate the mean of voted for two subsets of observations:
    • the mean of voted for the treatment group, only for the observations that were treated (for which pressure equals 1)
    • the mean of voted for the control group, only for the observations that were not treated (for which pressure equals 0)
    • To do this, we need to learn how to use the [] operator and the group_by() function

  • Operator []:
    • extracts a selection of observations from a variable

    • to its left, we specify the variable we want to subset

    • inside the square brackets, we specify the criteria of selection; we can specify a logical test using the relational operator ==; only the observations for which the test is true will be extracted

    • example: data$var1[data$var2==1]

      # extracts the observations of the variable var1 for which the variable var2 equals 1

  • group_by() function:

    • It groups the observations according to the values of a variable

    • We then use the outcome to estimate the mean

    • example:

      data %>% group_by(var2) %>% summarise(mean = mean(var1))

  • Compute the mean of voted for the treatment and control groups, separately

    mean(voting$voted[voting$pressure==1]) # treatment
    ## [1] 0.3779482
    mean(voting$voted[voting$pressure==0]) # control
    ## [1] 0.2966383
    # OR
    voting %>% group_by(pressure) %>%  summarise(mean = mean(voted))
    ## # A tibble: 2 × 2
    ##   pressure  mean
    ##      <dbl> <dbl>
    ## 1        0 0.297
    ## 2        1 0.378
  • Interpretation of the first mean?

    • 38% of the registered voters who received the message voted (38x100=38%)
  • Interpretation of the second mean?

    • 30% of the registered voters who did not receive the message voted (30x100=30%)

  • Now, we can compute the difference-in-means estimator as the difference between the two means above:

    mean(voting$voted[voting$pressure==1]) -
      mean(voting$voted[voting$pressure==0]) 
    ## [1] 0.08130991
  • direction, size, and unit of measurement of the effect?

    • increase of 8 percentage points
  • increase because we are measuring a change in \(Y\) and the number is positive

  • percentage points because it is the result of subtracting two percentages: %-% = p.p. (because voted is binary)

  • 8 (and not 0.08) because we need to multiply the number by 100 to turn it into p.p. (because voted is binary)

  • 38% - 30% = 8 p.p.

STEP 6. Write conclusion statement

  • What’s the assumption we are making when estimating the average causal effect?A: registered voters who received the message are comparable to registered voters who did not
  • Why is this a reasonable assumption?A: Data comes from a randomised experiment
  • What’s the treatment?A: receiving the message inducing social pressure
  • What’s the outcome?A: probability of voting
  • What’s the direction, size, and unit of measurement of the average causal effect?A: an increase of 8 percentage points, on average

CONCLUSION STATEMENT

Assuming that [the treatment and control groups are comparable](a reasonable assumption because …), we estimate that [the treatment] [increases/decreases] [the outcome] by [size and unit of measurement of the effect], on average.

Assuming that registered voters who received the message are comparable to the registered voters who did not(a reasonable assumption because the data come from a randomized experiment), we estimate that receiving the message inducing social pressure increases the probability of voting by 8 percentage points, on average.

Today’s class

  • Causal Effects
  • Treatment and Outcome Variables
  • Individual vs. Average Causal Effects
  • Randomized Experiments
  • Difference-in-Means Estimator
  • Units of Measurement of Means and Diffs-in-Means
  • In-Class Exercise: Does Social Pressure Increase the Probability of Turning Out To Vote?
  • How to Write a Conclusion Statement
  • R: ==, ifelse(), group_by(), summarise(), mean()