Week 3: Estimating Causal Effects with Randomized Experiments

POL269 Political Data Research

Javier Sajuria

05.02.2024

Why do we analyse data?

MEASURE:To infer population characteristics via survey research

what proportion of constituents support a particular policy?

PREDICT:To make predictions

who is the most likely candidate to win an upcoming election?

EXPLAIN:To estimate the causal effect of a treatment on an outcome

what is the effect of small classrooms on student performance?

We will progress from simple to more complex methods
We begin with EXPLAIN by learning how to estimate causal effects with randomized experiments
- involves relatively simple maths
Then, we will learn how to MEASURE the characteristics of an entire population from a sample of survey respondents
- visualizations, descriptive statistics, correlation
Then, we will learn how to PREDICT outcome variables
- simple linear regression
Then, we will return to EXPLAIN and estimate causal effects with observational data
- multiple linear regression

Plan for today

Causal Effects
Treatment and Outcome Variables
Individual Causal Effects
Average Causal Effects
Randomized Experiments
Difference-in-Means Estimator
Percentage points
Practical exercise

To answer, we will analyze data from a randomized experiment where registered voters in Michigan were randomly assigned to either
- (a) receive a message designed to induce social pressure to vote, or
- (b) receive nothing
The message told registered voters that after the election their neighbors would be informed about whether they voted in the election or not

Dear Registered Voter: WHAT IF YOUR NEIGHBORS KNEW WHETHER YOU VOTED? … We’re sending this mailing to you and your neighbours to publicize who does and does not vote. The chart shows the names of some of your neighbours, showing which have voted in the past. After the August 8 election, we intend to mail an updated chart. You and your neighbours will all know who voted and who did not. DO YOUR CIVIC DUTY–VOTE!

MAPEL DR	Name	Aug 2004	Nov 2004	Aug 2006
9993	JOSEPH JAMES SMITH	Voted	Voted	??
9995	JENNIFER KAY SMITH	Didn’t vote	Voted	??
9997	RICHARD B JACKSON	Didn’t vote	Voted	??
9999	KATHY MARIE JACKSON	Didn’t vote	Voted	??

The voting dataset

Unit of observation: Registered voters

Variables:

Variable	Description
birth	year of birth
message	whether registered voter received the message (“yes” or “no”)
voted	whether registered voter voted: 1= yes; 0=no

Causal Effects

Many of the most important research questions in politics involve estimating a causal effect:

Does foreign aid promote democratic government?
Do women promote different policies than men?
Do small classes improve student performance?
Does social pressure increase the probability of turning out to vote?

Causal Effects refer to the cause-and-effect connection between two variables:

treatment variable (X):variable whose change may produce a change in the outcome variable
outcome variable (Y):variable that may change as a result of a change in the treatment variable

The causal relationship we are interested is

\[ X \rightarrow Y \]

In the voting dataset we have three variables, birth, message, and voted, and we aim to answer the research question: “Does social pressure increase the probability of turning out to vote?”
What is the treatment variable?
- message: indicates whether register voter received the message inducing social pressure
What is the outcome variable?
- voted: indicates whether register voter voted
The causal relationship we are interested in is:

\[ message \rightarrow voted \]

Treatment variables

In this class, treatment variables will always be binary:

\[ \textrm{X}_i = \begin{cases} \textrm{1} \text{ if individual i takes the treatment} \\ \textrm{0} \text{ if inidividual i does not take the treatment}\end{cases} \]

In the voting experiment, the treatment variable is:

\[ \textrm{message}_i = \begin{cases} \textrm{1} \text{ if registered voter i received message} \\ \textrm{0} \text{ if registered voter i did not}\end{cases} \]

Based on whether the individual receives the treatment, we speak of two different conditions

treatment is the condition with the treatment: $X_i{=}\textrm{1}$
control is the condition without the treatment: $X_i{=}\textrm{0}$

Outcome variables

We will see different types of outcome variables

binary
non-binary

In the voting experiment, the outcome variable is:

\[ \textrm{voted}_i = \begin{cases} \textrm{1} \text{ if registered voter i voted}\\ \textrm{0} \text{ if registered voter i didn't vote}\end{cases} \]

what type of variable is voted?

Individual causal effects

The causal effect of X on Y is the change in the outcome variable caused by a change in the treatment variable

Ideally, we would like to compare two potential outcomes:
- outcome when the treatment is present: $Y_i(X_i=\textrm{1})$
- outcome when the treatment is absent: $Y_i(X_i=\textrm{0})$
If we could observe both potential outcomes for each individual $i$, the individual causal effect would be:

\[ \triangle Y_i = Y_i(X_i{=}\textrm{1}) - Y_i(X_i{=}\textrm{0}) \]

$\triangle Y_i$ represents the change in $Y$ for individual $i$

In the voting experiment, we aim to measure the extent to which the probability of voting changes as a result of receiving the social pressure message
Ideally, for each registered voter we would like to observe:
- whether they voted after receiving the social pressure message: voted$_i$(message$_i$=1)
  - whether they voter after NOT receiving the social pressure message: voted$_i$(message$_i$=0)
If this were possible, the effect of receiving the social pressure message on the probability of voting would be:

\[ \triangle \textrm{voted}_i = \textrm{voted}_i (\textrm{message}_i = \textrm{1}) - \textrm{voted}_i(\textrm{message}_i = \textrm{0}) \]

should be interpreted as an increase if positive, a decrease if negative, and as no effect if zero

Do we ever observe both potential outcomes for the same individual at the exact same time under the same circumstances?

We only observe the factual outcome: potential outcome under the condition received in reality
We can never observe the counterfactual outcome: potential outcome under the opposite condition as the one received in reality

Fundamental problem of causal inference: We can never observe the counterfactual outcome
As a result, we cannot measure causal effects at the individual level

Average causal effects

To get around the fundamental problem of causal inference, we must find good approximations for the counterfactual outcomes
We move away from individual-level effects and focus on the average causal effects across a group of individuals
The average causal effect of the treatment X on the outcome Y (also known as the average treatment effect) is the average of all the individual causal effects of X on Y within a group
- It is the average change in Y caused by a change in X for a group of individuals

How can we obtain good approximations for the counterfactual outcomes?
- We must find or create a situation in which the observations treated and the observations untreated are, at the aggregate level, similar with respect to all the variables that might affect the outcome other than the treatment variable itself
- Then, we can use the factual outcome of one group as a proxy for the counterfactual outcome of the other
The best way to accomplish this is by conducting a randomised experiment

Randomised experiments

A randomised experiment is a type of study design in which treatment assignment is randomized
- researchers decide who takes the treatment based on a random process such as the flip of a coin
Once treatment is administered, we differentiate between:
- treatment group: observations that received the treatment
- control group: observations that didn’t receive the treatment
In the voting experiment, what are the treatment and control groups?

Random treatment assignment makes the treatment and control groups on average identical to each other in all observed and unobserved pre-treatment characteristics

When treatment assignment is randomised, the only thing that distinguishes the treatment group from the control group, besides the treatment itself, is chance
- although the treatment and control groups consist of different individuals, the two groups are, as a whole, comparable to each other in terms of their pre-treatment characteristics (characteristics before treatment was administered)

If the treatment and control groups are comparable before the treatment is administered
- we can use the factual outcome of one group as a proxy for the counterfactual outcome of the other
- we can estimate the average treatment effect by calculating the difference-in-means estimator

\[ \bar{Y}_\text{treatment group} - \bar{Y}_\text{control group} \]

$\bar{Y}_\text{treatment group}$: average outcome for the treatment group

$\bar{Y}_\text{control group}$: average outcome for the control group

Only when the treatment and control groups are comparable does the diffs-in-means estimator produce a valid estimate of the average treatment effect
- $\widehat{\textrm{average_effect}} = \bar{Y}_\text{treatment group} - \bar{Y}_\text{control group}$
- “hat” on top of the name denotes this is an estimate

In the voting experiment, since treatment was randomly assigned, we can assume that the treatment and control groups are comparable and, thus, can estimate the average causal effect of receiving the message on the probability of voting by using the diffs-in-means estimator:

\[ \overline{\textrm{voted}}_\text{treatment group} - \overline{\textrm{voted}}_\text{control group} \]

$\overline{\textrm{voted}}_\text{treatment group}$: proportion of registered voters who voted among those who received the message
$\overline{\textrm{voted}}_\text{control group}$: proportion of registered voters who voted among those who did not receive the message
- why proportions and not averages? because voted is binary so the average of voted should be interpreted as a proportion, not an average

How to Run an Experiment

Random Treatment Assignment Makes Treatment and Control Groups Comparable When Sample Size is Large Enough (Link to interactive graph)

What is a percentage point?

It is the unit of measurement for the arithmetic difference between two percentages:

\[ \% - \% = p.p. \]

Example: if a candidate’s vote share increases from 50% to 60%, we would state that the vote share increased by …

\[ \triangle\textrm{vshare} = \textrm{vshare}_{\textrm{final}} - \textrm{vshare}_{\textrm{initial}} = \textrm{60%} - \textrm{50%} = \textrm{10 p.p.} \]

Why not 10%? If someone told us that an initial vote share was 50% and that it increased by 10%, the final vote share would be _________ (instead of 60%)
- What is 10% of 50%? $\textrm{0.10}{\times}\textrm{50}=\textrm{5 p.p.}$

\[ \textrm{vshare}_{\textrm{final}} = \textrm{vshare}_{\textrm{initial}} + \triangle \textrm{vshare} = \textrm{50%} + \textrm{5 p.p.} = \textrm{55%} \]

REVIEW: Unit of Measurement of Means

Unit of Measurement of the Diffs-in-Means Estimator

Formula of the difference-in-means estimator in words?
- Average outcome for the treatment group - Average outcome for the control group
If the outcome variable is binary, in what unit of measurement will the average outcomes be?
- percentages (after multiplying the decimal by 100)
What will be the unit of measurement of estimator?
- percentage points (% - % = p.p.)
Do we need to multiply the result by 100?
- yes!

Unit of Measurement of the Diffs-in-Means Estimator

What do we need to calculate to estimate the average causal effect of receiving the message on the probability of turning out to vote?
- the difference-in-means estimator
Why does the difference-in-means estimator provide us with a valid estimate of the average treatment effect?
- because the data come from a randomised experiment (where treatment was randomly assigned)
- as a result, treatment and control groups are comparable and we can use the factual outcome of one group as a proxy for the counterfactual outcome of the other

In this case, the difference-in-means estimator is:
Answer: \[ \overline{\textrm{voted}}_\text{treatment group} - \overline{\textrm{voted}}_\text{control group} \]
$\overline{\textrm{voted}}_\text{treatment group}$: proportion of registered voters who voted among those who received the message
$\overline{\textrm{voted}}_\text{control group}$: proportion of registered voters who voted among those who did not receive the message

In-class exercise

Open RStudio (RStudio will open R)
Download exercise_2.R and voting.csv from the website (pol269.sajuria.com/datasets)
Open exercise_2.R from within RStudio +
1. RStudio: File >> Open File
Run code from steps 1-3
1. use the setwd() for your computer

## STEP 1. Set the working directory
setwd("~/Desktop/POL269") # example for Mac 
setwd("C:/user/Desktop/POL269") # example for Windows

## STEP 2. Load the dataset
voting <- read.csv("voting.csv") # reads and stores data

## STEP 3. Look at the data
head(voting) # shows the first six observations
##   birth message voted
## 1  1981      no     0
## 2  1959      no     1
## 3  1956      no     1
## 4  1939     yes     1
## 5  1968      no     0
## 6  1967      no     0
## what does each observation represent?
## what is the outcome variable?
## what is the treatment variable?

Step 4. Create binary treatment variable

OPTION 1: ifelse()

First, we need to learn how to use == and ifelse()
The operator ==
- is used to create logical tests that evaluate whether the observations of a variable equal a particular value (the particular values should be in quotes if text but without quotes if numbers)
- examples:
  - data$variable == 1
  - data$variable == "yes"

The function ifelse()
- creates the contents of a new variable based on the values of an existing one
- requires three arguments, separated by commas, in the following order:
  
  (1) logical test (using ==)
  
  (2) return value if logical test is true,
  
  (3) return value if logical test is false
- example: ifelse(data$variable == "yes", 1, 0)

## STEP 4. Create binary treatment variable using ifelese()
## create variable pressure inside dataframe voting
voting$pressure <- # stores return values in new variable
  ifelse(voting$message=="yes", # logical test
         1, # return value if logical test is true
         0) # return value if logical test is false

You need to run the code all at once (not line by line)
Remember that R will ignore anything that follows the # sign, until the end of the line
What would have happened had we not added voting in front of pressure on the first line of code above?

Whenever we create a new variable, we should make sure it was created correctly by looking at the first few observations of the dataframe again

head(voting) # shows first observations
##   birth message voted pressure
## 1  1981      no     0        0
## 2  1959      no     1        0
## 3  1956      no     1        0
## 4  1939     yes     1        1
## 5  1968      no     0        0
## 6  1967      no     0        0

Note that when message equals “yes”, pressure equals 1; and when message equals “no”, pressure equals 0

OPTION 2: case_when()

This is the tidyverse option, and uses piping (%>%)
It uses a similar structure than ifelse(), but it requires to specify the default option using the TRUE parameter.
- example:
  
  data <- data %>% case_when( variable == "yes" ~ 1, TRUE ~ 0)

library(tidyverse)          # Load the tidyverse package

voting <- voting %>%        # Store the result 
  mutate(                   # Use mutate() to create a new variable
    pressure2 = case_when(  # nbew variable is called pressure2
      message == "yes" ~ 1,
      TRUE ~ 0              # if the above condition is not true 
    )
  )

This option is slightly longer, but it remains consistent with the use of tidyverse
You can choose which option to use, the results are basically the same.

head(voting)

  birth message voted pressure pressure2
1  1981      no     0        0         0
2  1959      no     1        0         0
3  1956      no     1        0         0
4  1939     yes     1        1         1
5  1968      no     0        0         0
6  1967      no     0        0         0

Note that pressure and pressure2 are the same.

STEP 5. Compute the difference-in-means estimator

\[ \overline{Y}_\text{treatment group} - \overline{Y}_\text{control group} \]

$\overline{Y}_\text{treatment group}$: average outcome for the treatment group

$\overline{Y}_\text{control group}$: average outcome for the control group

In the voting experiment, since is the outcome variable, the difference-in-means estimator is:

\[ \overline{\textrm{voted}}_\text{treatment group} - \overline{\textrm{voted}}_\text{control group} \]

Let’s start by practicing computing and interpreting means

mean(voting$voted)

[1] 0.3101759

# OR

voting %>% summarise(mean = mean(voted))

       mean
1 0.3101759

Interpretation?
- 31% of all the registered voters who were part of the experiment voted
Why in %?
- Because voted is binary
- Recall: The mean of a binary variable should be interpreted in % (after multiplying the output by 100)

mean(voting$voted) or summarise(mean = mean(voted)) compute the mean of voted for ALL the observations in the dataset
To compute the difference-in-means estimator, we need to calculate the mean of voted for two subsets of observations:
- the mean of voted for the treatment group, only for the observations that were treated (for which pressure equals 1)
- the mean of voted for the control group, only for the observations that were not treated (for which pressure equals 0)
- To do this, we need to learn how to use the [] operator and the group_by() function

Operator []:
- extracts a selection of observations from a variable
- to its left, we specify the variable we want to subset
- inside the square brackets, we specify the criteria of selection; we can specify a logical test using the relational operator ==; only the observations for which the test is true will be extracted
- example: data$var1[data$var2==1]
  
  # extracts the observations of the variable var1 for which the variable var2 equals 1

group_by() function:
- It groups the observations according to the values of a variable
- We then use the outcome to estimate the mean
- example:
  
  data %>% group_by(var2) %>% summarise(mean = mean(var1))

Compute the mean of voted for the treatment and control groups, separately

mean(voting$voted[voting$pressure==1]) # treatment
## [1] 0.3779482
mean(voting$voted[voting$pressure==0]) # control
## [1] 0.2966383
# OR
voting %>% group_by(pressure) %>%  summarise(mean = mean(voted))
## # A tibble: 2 × 2
##   pressure  mean
##      <dbl> <dbl>
## 1        0 0.297
## 2        1 0.378

Interpretation of the first mean?
- 38% of the registered voters who received the message voted (38x100=38%)
Interpretation of the second mean?
- 30% of the registered voters who did not receive the message voted (30x100=30%)

Now, we can compute the difference-in-means estimator as the difference between the two means above:

mean(voting$voted[voting$pressure==1]) -
  mean(voting$voted[voting$pressure==0]) 
## [1] 0.08130991

direction, size, and unit of measurement of the effect?
- increase of 8 percentage points
increase because we are measuring a change in $Y$ and the number is positive
percentage points because it is the result of subtracting two percentages: %-% = p.p. (because voted is binary)
8 (and not 0.08) because we need to multiply the number by 100 to turn it into p.p. (because voted is binary)
38% - 30% = 8 p.p.

STEP 6. Write conclusion statement

What’s the assumption we are making when estimating the average causal effect?A: registered voters who received the message are comparable to registered voters who did not
Why is this a reasonable assumption?A: Data comes from a randomised experiment
What’s the treatment?A: receiving the message inducing social pressure
What’s the outcome?A: probability of voting
What’s the direction, size, and unit of measurement of the average causal effect?A: an increase of 8 percentage points, on average

CONCLUSION STATEMENT

Assuming that [the treatment and control groups are comparable](a reasonable assumption because …), we estimate that [the treatment] [increases/decreases] [the outcome] by [size and unit of measurement of the effect], on average.

Assuming that registered voters who received the message are comparable to the registered voters who did not(a reasonable assumption because the data come from a randomized experiment), we estimate that receiving the message inducing social pressure increases the probability of voting by 8 percentage points, on average.

Today’s class

Causal Effects
Treatment and Outcome Variables
Individual vs. Average Causal Effects
Randomized Experiments
Difference-in-Means Estimator
Units of Measurement of Means and Diffs-in-Means
In-Class Exercise: Does Social Pressure Increase the Probability of Turning Out To Vote?
How to Write a Conclusion Statement
R: ==, ifelse(), group_by(), summarise(), mean()

Week 3: Estimating Causal Effects with Randomized Experiments

Why do we analyse data?

Plan for today

Does Social Pressure Affect Turnout?

The voting dataset

Causal Effects

Treatment variables

Outcome variables

Individual causal effects

Average causal effects

Randomised experiments

How to Run an Experiment

What is a percentage point?

REVIEW: Unit of Measurement of Means

Unit of Measurement of the Diffs-in-Means Estimator

Unit of Measurement of the Diffs-in-Means Estimator

Does Social Pressure Affect Turnout?

Does Social Pressure Affect Turnout?

In-class exercise

Step 4. Create binary treatment variable

STEP 5. Compute the difference-in-means estimator

STEP 6. Write conclusion statement

CONCLUSION STATEMENT

Today’s class