Week 5: Predicting Non-Binary Outcomes Using Linear Regression

POL269 Political Research

Javier Sajuria

2024-02-19

Why do we analyse data?

MEASURE:To infer population characteristics via survey research

what proportion of constituents support a particular policy?

PREDICT:To make predictions

who is the most likely candidate to win an upcoming election?

EXPLAIN:To estimate the causal effect of a treatment on an outcome

what is the effect of small classrooms on student performance?

Why do we analyse data?

MEASURE:To infer population characteristics via survey research

what proportion of constituents support a particular policy?

PREDICT:To make predictions

who is the most likely candidate to win an upcoming election?

EXPLAIN:To estimate the causal effect of a treatment on an outcome

what is the effect of small classrooms on student performance?

REVIEW

When estimating causal effects (Chapter 2)
1. \(X\) is the treatment variable
2. \(Y\) is the outcome variable
3. aim: to estimate the effect of \(X\) on \(Y\) without bias
4. assumption: treatment and control groups were comparable before treatment was administered
5. best way of satisfying assumption: random treatment assignment

REVIEW

When infering population characteristics (Chapter 3)
1. aim: to infer from a sample the characteristics of the population without bias
2. assumption: sample is representative of population
3. best way of satisfying assumption: random sampling

REVIEW

When making predictions (Chapter 4)
1. \(X\) are predictors: variables that we use as the basis for our predictions
2. \(Y\) is the outcome variable: what we want to predict
3. \(\widehat{Y}\) is the predicted outcome: our predictions of \(Y\) based on the values of \(X\) and the model that summarizes the relationship between \(X\) and \(Y\) + \(\widehat{\epsilon}\) are the errors of our predictions: \(\widehat{\epsilon}\) = \(Y\) - \(\widehat{Y}\)
4. aim: to predict \(Y\) as accurately as possible, with the smallest errors possible
5. best way to achieve our aim: find predictors that are highly correlated with the outcome so that the linear model will fit the data well (\(\textrm{R}^2\) will be high)

Plan for today

Prediction and Linear Regression
Example with Non-binary Outcome Variable:
1. Using Midterm Scores to Predict Final Exam Scores
  1. Load and explore data
  2. Identify X and Y
  3. What is the relationship between X and Y?
    1. Create scatter plot
    2. Calculate correlation

Fit a linear model using the least squares method
Interpret coefficients
Make predictions
Measure how well the model fits the data

Using Midterm Scores to Predict Final Exam Scores

Today we will analyse real, historical, student performance data from my class
Our goal is to model the relationship between midterm and final grades
So that we can later predict final grades based on midterm scores

Load and explore data

data <- read.csv("grades.csv") # reads and stores data
head(data) # shows first observations

  X.1 X Assignment.1 Take.Home.Exam Course.total distinction
1   1 1           75             86         80.5         yes
2   2 2           75             86         80.5         yes
3   3 3           74             86         80.0         yes
4   4 4           55             86         70.5         yes
5   5 5           80             84         82.0         yes
6   6 6           78             82         80.0         yes

what’s the unit of observation?
for each variable: type and unit of measurement?
substantively interpret the first observation

Identify X and Y

The predictor (X) is the variable we want to use to predict the outcome (Y)
- in this case, the predictor is Assignment.1
- let’s visualize the distribution of Assignment.1

data |> ggplot(aes(x = Assignment.1)) + geom_histogram(binwidth = 5)

The outcome (Y) is the variable that we want to predict
- in this case, the outcome variable is Course.total
- let’s visualize the distribution of Course.total

data |> ggplot(aes(x = Course.total)) + geom_histogram(binwidth = 5)

What is the relationship between X and Y?
1. Create scatter plot to visualize the relationship between Assignment.1 and Course.total

data |> ggplot(aes(x = Assignment.1, y = Course.total)) + geom_point()

note that the Y variable always goes in the y-axis and the X variable always goes in the x-axis
what does each dot represent?
does the relationship look positive or negative?
does the relationship look weekly or strongly linear?

Calculatecorrelation to measure direction and strength of linear association between Assignment.1 and Course.total

cor(data$Assignment.1, data$Course.total)

[1] 0.8652242

we find a moderately strong positive correlation
- are we surprised by this number?
- no because in the scatter plot above we observed that the relationship was positive and moderately strongly linear so it makes sense that the correlation coefficient is positive and closer to 1 than to 0
- we now know that higher midterm scores are likely to be associated with higher final grades
- ideally, we would like to summarize the relationship with a mathematical model so that we can use given model to make predictions

To summarize the relationship between \(X\) and \(Y\), we can use a linear model
Which line better summarizes the relationship?

The goal is the choose the line that best fits the data
- pick the line closest to the data
- which has the smallest prediction errors (vertical distance between dots and the line)

To choose the line of best fit, we use the least squares method:
- chooses the line that minimizes prediction errors
- in particular, it minimizes \(\sum^{n}_{i=1}\) \(\widehat{\epsilon}^2\) (sum of the squared of the prediction errors/residuals)

Understanding How the Least Squares Method Chooses Line of Best Fit (link

A line is defined by two coefficients:
- intercept specifies the vertical location of the line
- slope specifies the angle or steepness of the line
Mathematically, the fitted line is \(\widehat{Y} = \widehat{\alpha} + \widehat{\beta}\)
- \(\widehat{\alpha}\) is the intercept
- \(\widehat{\beta}\) is the slope

If you learned that a line was \(Y = mX + b\)
- think that \(m\) is now \(\widehat{\beta}\)
- think that \(b\) is now \(\widehat{\alpha}\)
^ (called ‘hat’) stands for predicted or estimated
- \(\widehat{Y}\) is the predicted outcome
- \(\widehat{\alpha}\) and \(\widehat{\beta}\) are the estimated coefficients

R function to fit a linear model: lm()
- required argument: a formula of the type \(Y \sim X\)

lm(Course.total ~ Assignment.1, data = data)


Call:
lm(formula = Course.total ~ Assignment.1, data = data)

Coefficients:
 (Intercept)  Assignment.1  
     15.6116        0.7687

\(\widehat{\alpha} = 15.61\) and \(\widehat{\beta} = 0.77\)
The fitted line is \(\widehat{Y} = 15.61 + 0.77*X\)
More specifically: \(\widehat{Course.total = 15.61 + 0.77*Assignment.1}\)

We can now add the fitted line to the scatter plot above using geom_smooth()

data |> 
  ggplot(aes(x = Assignment.1, y = Course.total)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)

5. Interpretation of Coefficients:

The intercept (\(\widehat{\alpha}\)) is the \(\widehat{Y}\) when \(X\)=0

Find 0 on the X-axis, go up to the line, find the value of \(\widehat{Y}\) associated with X=0

here: \(\widehat{\alpha} = 15.61\)

Mathematical definition of \(\widehat{\alpha}\)

\[\begin{align*} \widehat{Y} &= \widehat{\alpha} + \widehat{\beta}\,\, X & \color{gray}(\textrm{by definition})\\ \widehat{Y} &= \widehat{\alpha} + \widehat{\beta}\times 0 &\color{gray}(\textrm{if } X=0)\\ \widehat{Y} &= \widehat{\alpha} + 0 &\color{gray}(\textrm{if } X=0)\\ \widehat{Y} &= \widehat{\alpha} &\color{gray}(\textrm{if } X=0) \end{align*}\]

\(\alpha\) is the value of \(\widehat{Y}\) when \(X=0\)

substantive interpretation of \(\widehat{\alpha}\)?
- start with mathematical definition:
  - \(\widehat{\alpha}\) is the \(\widehat{Y}\) when X=0
- substitute X, Y, and \(\widehat{\alpha}\):
  - \(\widehat{\alpha}\) = 15.61 is the \(\widehat{\textrm{Course.total}}\) when Assignment.1=0
- put it in words (using units of measurement):
  - when a student scores 0 points in the midterm, we predict that in the final exam they will score 15.61 points, on average + sometimes it is nonsensical (due to extrapolation)
unit of measurement of \(\widehat{\alpha}\)?
- same as \(\overline{Y}\); here, Y is non-binary and measured in points so \(\overline{Y}\) and \(\widehat{\alpha}\) are measured in points

5. Interpretation of Coefficients

The slope (\(\widehat{\beta}\)) is the \(\triangle \widehat{Y}\) associated with \(\triangle X\)=1

Pick two points on the line, measure \(\triangle \widehat{Y}\) and \(\triangle X\) associated with the two points, calculate \(\triangle \widehat{Y}\)/\(\triangle X\)

here: \(\widehat{\beta}\) = \(\frac{\textrm{rise}}{\textrm{run}} = \frac{\textrm{77.11-(15.61)}}{\textrm{80-0}}=\frac{\textrm{61.5}}{\textrm{80}}=\textrm{0.77}\)

\[\begin{eqnarray*} \triangle \widehat{Y} &=& \widehat{Y}_{\textrm{final}} {-} \widehat{Y}_{\textrm{initial}}\\ \triangle X &=& X_{\textrm{final}} {-} X_{\textrm{initial}}\\ \end{eqnarray*}\]

Mathematical Definition of \(\widehat{\beta}\)

\[\begin{align*} \triangle \widehat{Y} &= \widehat{Y}_{\textrm{final}} {-} \widehat{Y}_{\textrm{initial}} \\ \triangle \widehat{Y} &= (\widehat{\alpha} + \widehat{\beta} X_{\textrm{final}}) - (\widehat{\alpha} + \widehat{\beta} X_{\textrm{initial}}) \\ \triangle \widehat{Y} &= \widehat{\alpha} - \widehat{\alpha} + \widehat{\beta} \,(X_{\textrm{final}}{-} X_{\textrm{initial}}) \\ \triangle \widehat{Y} &= \widehat{\beta} \,(X_{\textrm{final}}{-} X_{\textrm{initial}}) \\ \triangle \widehat{Y} &= \widehat{\beta} \,(\triangle X) \\ \triangle \widehat{Y} &= \widehat{\beta} \times 1 &\color{gray}(\textrm{if } \triangle X=1)\\ \triangle \widehat{Y} &= \widehat{\beta} \end{align*}\]

\(\widehat{\beta}\) is the value of \(\triangle \widehat{Y}\) associated with \(\triangle X\) = 1

substantive interpretation of \(\widehat{\beta}\)?
- start with mathematical definition:
  - \(\widehat{\beta}\) is the \(\triangle \widehat{Y}\) associated with \(\triangle X\)=1
- substitute X, Y, and \(\widehat{\beta}\):
  - \(\widehat{\beta}\) = 0.77 is the \(\triangle \widehat{\textrm{final}}\) associated with \(\triangle\)midterm=1
- put it in words (using units of measurement):
  - an increase in midterm scores of 1 point is associated with a predicted increase in final exam scores of 0.97 points, on average
- has the same sign as the cor(X,Y) (always the case!)
unit of measurement of \(\widehat{\beta}\)?
- same as \(\triangle \overline{Y}\); here, Y is non-binary and measured in points so \(\triangle \overline{Y}\) and \(\widehat{\beta}\) are measured in points

Understanding the Role the Intercept and the Slope Play in Defining a Line (link to interactive graph)

THE FITTED LINE IS:

\[ \widehat{Y} = \widehat{\alpha} + \widehat{\beta}X \]

\(\widehat{\alpha}\) (alpha-hat) is the estimated intercept coefficient

the \(\widehat{Y}\) when \(X = 0\)

(in the same unit of measurement as \(\widehat{Y}\))
\(\widehat{\beta}\) (beta-hat) is the estimated slope coefficient

the \(\triangle \widehat{Y}\) associated with \(\triangle X{=}\textrm{1}\)

(in the same unit of measurement as \(\triangle\overline Y\))

Make Predictions

Now that we have found the line that best summarizes the relationship between X and Y, we can use it to make predictions
Two types of predictions we might be interested in:
1. predict \(\widehat{Y}\) based on \(X\): \(\widehat{Y} = \widehat{\alpha} + \widehat{\beta} X\)
2. predict \(\triangle \widehat{Y}\) associated with \(\triangle X\): \(\triangle \widehat{Y} = \widehat{\beta} \triangle X\)

To predict \(\widehat{Y}\) based on \(X\): \(\widehat{Y} = \widehat{\alpha} + \widehat{\beta} X\)

Example 1: Imagine you earn 80 points in the midterm, what would we predict your final exam score will be?

\[\begin{eqnarray*} \widehat{Course.total} &=& 15.61 + 0.77 \times Assignment.1 \\ \widehat{Course.total} &=& 15.61 + 0.77 \times 80 \,\, (if Assignment.1 = 80) \\ \widehat{Course.total} &=& 77.21 \\ \end{eqnarray*}\]

If you earn 80 points in the midterm, we would predict that you will get a final grade of 77.21, on average
Note: \(\widehat{Y}\) is in the same unit of measurement as \(\overline{Y}\);

here, Y is non-binary and measured in points so \(\overline{Y}\) and \(\widehat{Y}\) are also measured in points

Example 2: Imagine you earn 90 points in the midterm, what would we predict your final grade will be?

\[\begin{eqnarray*} \widehat{Course.total} &=& 15.61 + 0.77 \times Assignment.1 \\ \widehat{Course.total} &=& 15.61 + 0.77 \times 90 \,\, (if Assignment.1 = 90)\\ \widehat{Course.total} &=& 84.91 \\ \end{eqnarray*}\]

If you earn 90 points in the midterm, we would predict that you will get a final grade of 84.91, on average

To predict \(\triangle \widehat{Y}\) associated with \(\triangle X\): \(\triangle \widehat{Y} = \widehat{\beta} \, \triangle \text{X}\)

Example 3: What is the predicted change in final exam scores associated with an increase in midterm scores of 10 points?

\[\begin{eqnarray*} \triangle\widehat{Course.total} &=& 0.77 \triangle Assignment.1\\ \triangle \widehat{Course.total} &=& 0.77 \times 10 \\ \triangle \widehat{Course.total} &=& 7.7 \\ \end{eqnarray*}\]

Measure how well the model fits the data with \(\textrm{R}^2\)

We will see how to do this next lecture

Today’s lecture

How to summarize the relationship between X and Y with a line: lm() and geom_smooth()
How to interpret the two estimated coefficients: (\(\widehat{\alpha}\) and \(\widehat{\beta}\)) when outcome variable is non-binary
How to make predictions with the fitted line: predict \(\widehat{Y}\) based on \(X\) and predict \(\triangle\widehat{Y}\) based on

Next Class

Another example of how to use the linear model to make predictions, but with binary outcome
How to measure how well the model fits the data with \(\textrm{R}^2\)