Week 9: Probability
POL269 Political Research
2024-03-18
Plan for Today
Probability
Events, Random Variables, and Probability Distributions
Probability Distributions
Population Parameters vs. Sample Statistics
Law of Large Numbers and Central Limit Theorem
Probability
- There are two different ways of interpreting probability
- According to the frequentist interpretation, the probability of an event is the proportion of its occurrence among infinitely many identical trials
- Example: probability of heads when flipping a coin
- According to the Bayesian interpretation, probabilities represent one’s subjective beliefs about the relative likelihood of events
- Example: probability of rain in the afternoon
Events, Random Variables, and Probability Distributions
- Most things in our lives can be considered events (sets of outcomes that occur with a particular probability)
- event: being 6 feet or taller
- As soon as we assign a number to an event, we create what is known as a random variable (a random variable assigns a numeric value to each mutually exclusive event produced by a trial)
- random variable tall \[\textrm{tall}_i = \begin{cases} \textrm{1} \text{ if individual i is 6 feet or taller}\\[3pt] \textrm{0} \text{ if individual i is not}\end{cases}\]
- Each random variable has a probability distribution, which characterizes the likelihood of each value the variable can take
- probability distribution of tall:
- \(P(tall=1)\) = probability of being tall
- \(P(tall = 0)\) = probability of not being tall
- ll probabilities in a distribution must add up to 1
Probability Distributions
- In chapter 1, we distinguished between binary and non-binary (random) variables
- binary variables are ….
- non-binary variables are …
- We will focus on two different types of probability distributions
- Bernoulli distribution: probability distribution of a binary variable
- Normal distribution: probability distribution we commonly use as a good approximation for many non-binary variables
Bernoulli Distribution
- Probability distribution of a binary variable
- It is characterized by one parameter: \(p\)
- Recall, all probabilities in a distribution must add up to 1
- If \(P(X=1) = p\), then \(P(x=0)\) = \(1-p\)
- If \(P(tall=1)=0.80\), then what is \(P(tall=0) = ?\)
- The mean of a Bernoulli distribution is \(p\)
- The variance of a Bernoulli distribution is \(p(1-p)\)
Example: Passing this module
- Event: passing the class
- Random Variable: \(pass = {0,1,1,1,1,1,1,1,1,1}\)
\[
\textrm{where: pass}_i = \begin{cases} 1 \text{ if student i passed the class} \\ 0 \text{ if student i didn't pass the class}\end{cases}
\]
- Probability distribution: Bernoulli where \(p= ?\)
- \(P(pass=1)=p\)
- \(P(pass=0)=1-p\)
\[
\textrm{pass} = \{0,1,1,1,1,1,1,1,1,1\}
\]
What is the probability that pass = 1? What is \(p\)?
\[\begin{align*}
\textrm{P(pass=1)} & = \frac{\textrm{number of students who passed}}{\textrm{total number of students}}\\[.3cm]& = \frac{\textrm{frequency of 1s}}{\textrm{total number of observations}} = \textrm{?}\\
\end{align*}\]
\(P(pass=1) = p = 0.90\)
Interpretation?
- The probability of passing the class is of 90%
\[
\textrm{pass} = \{0,1,1,1,1,1,1,1,1,1\}
\]
- What is the probability that pass = 0? What is 1-\(p\)?
\[\begin{align*}
\textrm{P(pass=1)} & = \frac{\textrm{number of students who didn't pass}}{\textrm{total number of students}}\\[.3cm]& = \frac{\textrm{frequency of 0s}}{\textrm{total number of observations}} = \textrm{?}\\
\end{align*}\]
- \(P(pass=0)=1-p = 1-0.90 = 0.10\). Interpretation?
- The probability of failing the class is of 10%
Normal Distribution
- Probability distribution we commonly use as a good approximation for many non-binary variables
- It is characterized by two parameters: \(\mu\) (mu, the mean) and \(\sigma^{2}\) (sigma-squared, the variance)
- Normal random variables are variables normally distributed
\[
X \sim N(\mu, \sigma^2)
\]
- Examples of normal distributions
![]()
- What’s the mean and variance of \(N\)(0,1)?
- The probability density function of the normal distribution represents the likelihood of each possible value the normal random variable can take
- We can use it to compute the probability that X takes a value within a given range:
\[
\textrm{P}(\textrm{x}_{1} \leq X \leq \textrm{x}_{2}) = \textrm{area under the curve between } \textrm{x}_{1} \textrm{ and } \textrm{x}_{2}
\]
![]()
\[
\textrm{P}(-1 \leq X \leq 0) < \textrm{or} > \textrm{P}(1 \leq X \leq 2) \textrm{?}
\]
The Standard Normal Distribution
- Normal distribution with mean 0 and variance 1 (and standard deviation = 1)
- In mathematical notation, we refer to the standard normal random variable as \(Z\) and write it as:
\[
Z \sim N\textrm{(0, 1)}
\]
- Note this \(Z\) has nothing to do with confounding variables
- \(Z\) has two useful properties…
First, since \(Z\) is symmetric and centered at 0:
\[
\textrm{P}(Z\leq{-}\textrm{z}) = \textrm{P}(Z \geq \textrm{z}) \qquad \qquad \textrm{ (where } \textrm{z} \geq 0)
\]
Second, about 95% of the observations of \(Z\) are between -1.96 and 1.96:
\[
\textrm{P}(\textrm{-1.96} \leq Z \leq\textrm{1.96}) \approx \textrm{0.95}
\]
Population Parameters vs. Sample Statistics
- When we analyze data, we are usually interested in the value of a parameter at the population level
- Example: proportion of candidate A supporters among all voters in a country
- We typically only have access to statistics from a small sample of observations drawn from the target population
- Example: proportion of supporters among the voters who responded to a survey
- The sample statistics differ from the population parameters because the sample contains noise
- This noise comes from sampling variability
Sampling variability
- Refers to the fact that the value of a statistic varies from one sample to another because each sample contains a different set of observations drawn from the target population
- This is true even when the samples are drawn using the exactly same method such as random sampling
- Smaller sample size generally leads to greater sampling variability
What proportion of US voters supports candidate A?
![]()
- If we draw a random sample from the population over and over again, we will get different proportions of support (\(\overline{X}\))
- Again, this is due to sampling variability
- How can we figure out what we want to know: the proportion of support among the whole population?
- The two large sample theorems—the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT)—help us understand the relationship between population parameters and sample statistics
- As we will see next class, we can use the CLT to draw conclusions about population parameters using data from just a sample
- Let’s define the different terms…
- Population Parameters: population characteristics that we might be interested in knowing
- \(E(X)\) (expectation of X): population mean of the random variable X
- \(V(X)\) (population variance of X): population variance of the random variable X
- Sample Statistics: what we can measure by drawing a sample of n observations from the population. (Problem: they vary from sample to sample; they contain noise)
- \(\overline{X}\) (sample mean of X): average value of X in a particular sample
- \(var(X)\) (sample variance X): variance of X in a particular sample
The Law of Large Numbers
As the sample size increases, the sample mean of \(X\) approximates the population mean of \(X\)
\[
\textrm{as } n \textrm{ increases, }\,\,\,\, \overline{X} = \frac{\sum^{n}_{i{=}\textrm{1}} X_i}{n} \,\,\,\,\approx \,\,\,\, E (X)
\]
- Example in the book: Proportion of support in a sample of 1 million observations is likely to be closer to the proportion of support in the whole population than the proportion of support in a sample of 1 thousand observations
The Central Limit Theorem
As the sample size increases, the standardized sample mean of \(X\) can be approximated by the standard normal distribution
\[
\textrm{as } n \textrm{ increases, }\,\,\,\, \frac{\overline{X}-E(X)}{\sqrt{V(X)/n}} \,\,\, \stackrel{\textrm{approx.}}{\sim} \,\,\, N \textrm{(0, 1)}
\]
- Example in the book: Even when the random variable we draw from is binary, if we draw repeated large samples, the standardized sample means will approximately follow a standard normal distribution
- If we drew multiple samples of 1,000 observations from a random variable, compute the sample mean for each sample, and observed that the sample means were centred at 10 with variance 0.002, what would be your best guess for:
- \(E(X)\) (the population mean of the random variable)?
- Answer: \(E(X)\) \(\approx\) 10
- Recall: mean of \(\overline{X} \approx E(X)\)
- \(V(X)\) (the population variance of the random variable)?
- Answer: \(V(X)\) \(\approx\) 2
- Recall: variance of \(\overline{X}\) \(\approx\) \(V(X)\)/n; so 0.002 \(\approx\) \(V(X)\)/1,000, and thus \(V(X) \approx\) 1,000*0.002 \(\approx\) 2
Today’s Class
Probability
Events, Random Variables, and Probability Distributions
Population Parameters vs. Sample Statistics
Law of Large Numbers and Central Limit Theorem
Next Class
- Hypothesis Testing with Coefficients (We will use CLT to determine whether an average casual effect is likely to be different than zero at the population level)