POL269 Political Research
2024-02-19
MEASURE:To infer population characteristics via survey research
PREDICT:To make predictions
EXPLAIN:To estimate the causal effect of a treatment on an outcome
MEASURE:To infer population characteristics via survey research
PREDICT:To make predictions
EXPLAIN:To estimate the causal effect of a treatment on an outcome
Prediction and Linear Regression
Example with Non-binary Outcome Variable:
Fit a linear model using the least squares method
Interpret coefficients
Make predictions
Measure how well the model fits the data
X.1 X Assignment.1 Take.Home.Exam Course.total distinction
1 1 1 75 86 80.5 yes
2 2 2 75 86 80.5 yes
3 3 3 74 86 80.0 yes
4 4 4 55 86 70.5 yes
5 5 5 80 84 82.0 yes
6 6 6 78 82 80.0 yes
we find a moderately strong positive correlation
are we surprised by this number?
no because in the scatter plot above we observed that the relationship was positive and moderately strongly linear so it makes sense that the correlation coefficient is positive and closer to 1 than to 0
we now know that higher midterm scores are likely to be associated with higher final grades
ideally, we would like to summarize the relationship with a mathematical model so that we can use given model to make predictions
lm()
Call:
lm(formula = Course.total ~ Assignment.1, data = data)
Coefficients:
(Intercept) Assignment.1
15.6116 0.7687
\(\widehat{\alpha} = 15.61\) and \(\widehat{\beta} = 0.77\)
The fitted line is \(\widehat{Y} = 15.61 + 0.77*X\)
More specifically: \(\widehat{Course.total = 15.61 + 0.77*Assignment.1}\)
geom_smooth()
5. Interpretation of Coefficients:
The intercept (\(\widehat{\alpha}\)) is the \(\widehat{Y}\) when \(X\)=0
\(\alpha\) is the value of \(\widehat{Y}\) when \(X=0\)
15.61
is the \(\widehat{\textrm{Course.total}}\) when Assignment.1=05. Interpretation of Coefficients
The slope (\(\widehat{\beta}\)) is the \(\triangle \widehat{Y}\) associated with \(\triangle X\)=1
\(\widehat{\beta}\) is the value of \(\triangle \widehat{Y}\) associated with \(\triangle X\) = 1
Understanding the Role the Intercept and the Slope Play in Defining a Line (link to interactive graph)
\[ \widehat{Y} = \widehat{\alpha} + \widehat{\beta}X \]
\(\widehat{\alpha}\) (alpha-hat) is the estimated intercept coefficient
the \(\widehat{Y}\) when \(X = 0\)
(in the same unit of measurement as \(\widehat{Y}\))
\(\widehat{\beta}\) (beta-hat) is the estimated slope coefficient
the \(\triangle \widehat{Y}\) associated with \(\triangle X{=}\textrm{1}\)
(in the same unit of measurement as \(\triangle\overline Y\))
If you earn 80 points in the midterm, we would predict that you will get a final grade of 77.21, on average
Note: \(\widehat{Y}\) is in the same unit of measurement as \(\overline{Y}\);
here, Y is non-binary and measured in points so \(\overline{Y}\) and \(\widehat{Y}\) are also measured in points
How to summarize the relationship between X and Y with a line: lm()
and geom_smooth()
How to interpret the two estimated coefficients: (\(\widehat{\alpha}\) and \(\widehat{\beta}\)) when outcome variable is non-binary
How to make predictions with the fitted line: predict \(\widehat{Y}\) based on \(X\) and predict \(\triangle\widehat{Y}\) based on
Another example of how to use the linear model to make predictions, but with binary outcome
How to measure how well the model fits the data with \(\textrm{R}^2\)
POL269