Week 9: Probability

POL269 Political Research

Javier Sajuria

2024-03-18

Plan for Today

  • Probability

  • Events, Random Variables, and Probability Distributions

  • Probability Distributions

    • Bernoulli Distribution

    • Normal Distribution

    • The Standard Normal Distribution

  • Population Parameters vs. Sample Statistics

  • Law of Large Numbers and Central Limit Theorem

Probability

  • There are two different ways of interpreting probability
  • According to the frequentist interpretation, the probability of an event is the proportion of its occurrence among infinitely many identical trials
    • Example: probability of heads when flipping a coin
  • According to the Bayesian interpretation, probabilities represent one’s subjective beliefs about the relative likelihood of events
    • Example: probability of rain in the afternoon

Events, Random Variables, and Probability Distributions

  • Most things in our lives can be considered events (sets of outcomes that occur with a particular probability)
    • event: being 6 feet or taller
  • As soon as we assign a number to an event, we create what is known as a random variable (a random variable assigns a numeric value to each mutually exclusive event produced by a trial)
    • random variable tall \[\textrm{tall}_i = \begin{cases} \textrm{1} \text{ if individual i is 6 feet or taller}\\[3pt] \textrm{0} \text{ if individual i is not}\end{cases}\]

  • Each random variable has a probability distribution, which characterizes the likelihood of each value the variable can take
    • probability distribution of tall:
      • \(P(tall=1)\) = probability of being tall
      • \(P(tall = 0)\) = probability of not being tall
  • ll probabilities in a distribution must add up to 1

Probability Distributions

  • In chapter 1, we distinguished between binary and non-binary (random) variables
    • binary variables are ….
    • non-binary variables are …
  • We will focus on two different types of probability distributions
    • Bernoulli distribution: probability distribution of a binary variable
    • Normal distribution: probability distribution we commonly use as a good approximation for many non-binary variables

Bernoulli Distribution

  • Probability distribution of a binary variable
  • It is characterized by one parameter: \(p\)
    • Recall, all probabilities in a distribution must add up to 1
    • If \(P(X=1) = p\), then \(P(x=0)\) = \(1-p\)
    • If \(P(tall=1)=0.80\), then what is \(P(tall=0) = ?\)
  • The mean of a Bernoulli distribution is \(p\)
  • The variance of a Bernoulli distribution is \(p(1-p)\)

Example: Passing this module

  • Event: passing the class
  • Random Variable: \(pass = {0,1,1,1,1,1,1,1,1,1}\)

\[ \textrm{where: pass}_i = \begin{cases} 1 \text{ if student i passed the class} \\ 0 \text{ if student i didn't pass the class}\end{cases} \]

  • Probability distribution: Bernoulli where \(p= ?\)
    • \(P(pass=1)=p\)
    • \(P(pass=0)=1-p\)

\[ \textrm{pass} = \{0,1,1,1,1,1,1,1,1,1\} \]

What is the probability that pass = 1? What is \(p\)?

\[\begin{align*} \textrm{P(pass=1)} & = \frac{\textrm{number of students who passed}}{\textrm{total number of students}}\\[.3cm]& = \frac{\textrm{frequency of 1s}}{\textrm{total number of observations}} = \textrm{?}\\ \end{align*}\]
  • \(P(pass=1) = p = 0.90\)

  • Interpretation?

    • The probability of passing the class is of 90%

\[ \textrm{pass} = \{0,1,1,1,1,1,1,1,1,1\} \]

  • What is the probability that pass = 0? What is 1-\(p\)?
\[\begin{align*} \textrm{P(pass=1)} & = \frac{\textrm{number of students who didn't pass}}{\textrm{total number of students}}\\[.3cm]& = \frac{\textrm{frequency of 0s}}{\textrm{total number of observations}} = \textrm{?}\\ \end{align*}\]
  • \(P(pass=0)=1-p = 1-0.90 = 0.10\). Interpretation?
    • The probability of failing the class is of 10%

Normal Distribution

  • Probability distribution we commonly use as a good approximation for many non-binary variables
  • It is characterized by two parameters: \(\mu\) (mu, the mean) and \(\sigma^{2}\) (sigma-squared, the variance)

  • Normal random variables are variables normally distributed
    \[ X \sim N(\mu, \sigma^2) \]
  • Examples of normal distributions
  • What’s the mean and variance of \(N\)(0,1)?

  • The probability density function of the normal distribution represents the likelihood of each possible value the normal random variable can take
  • We can use it to compute the probability that X takes a value within a given range:

\[ \textrm{P}(\textrm{x}_{1} \leq X \leq \textrm{x}_{2}) = \textrm{area under the curve between } \textrm{x}_{1} \textrm{ and } \textrm{x}_{2} \]

\[ \textrm{P}(-1 \leq X \leq 0) < \textrm{or} > \textrm{P}(1 \leq X \leq 2) \textrm{?} \]

The Standard Normal Distribution

  • Normal distribution with mean 0 and variance 1 (and standard deviation = 1)
  • In mathematical notation, we refer to the standard normal random variable as \(Z\) and write it as:

\[ Z \sim N\textrm{(0, 1)} \]

  • Note this \(Z\) has nothing to do with confounding variables
  • \(Z\) has two useful properties…

First, since \(Z\) is symmetric and centered at 0:

\[ \textrm{P}(Z\leq{-}\textrm{z}) = \textrm{P}(Z \geq \textrm{z}) \qquad \qquad \textrm{ (where } \textrm{z} \geq 0) \]

Second, about 95% of the observations of \(Z\) are between -1.96 and 1.96:

\[ \textrm{P}(\textrm{-1.96} \leq Z \leq\textrm{1.96}) \approx \textrm{0.95} \]

How to Transform A Normal Random Variable Into the Standard Normal Random Variable

\[\begin{align*} \textrm{if } X \sim N(\mu, \sigma^{2})\textrm{, } \frac{X-\mu}{\sigma} \sim N\textrm{(0, 1)} \end{align*}\]

Example, if \(X \sim N(4,25), \frac{X-?}{?}\sim N\textrm{(0, 1)}\)

\[ \textrm{Answer: }\frac{X - \textrm{4}}{\textrm{5}} \sim N\textrm{(0,1}) \]

Population Parameters vs. Sample Statistics

  • When we analyze data, we are usually interested in the value of a parameter at the population level
    • Example: proportion of candidate A supporters among all voters in a country
  • We typically only have access to statistics from a small sample of observations drawn from the target population
    • Example: proportion of supporters among the voters who responded to a survey
  • The sample statistics differ from the population parameters because the sample contains noise
    • This noise comes from sampling variability

Sampling variability

  • Refers to the fact that the value of a statistic varies from one sample to another because each sample contains a different set of observations drawn from the target population
  • This is true even when the samples are drawn using the exactly same method such as random sampling
  • Smaller sample size generally leads to greater sampling variability

What proportion of US voters supports candidate A?

  • If we draw a random sample from the population over and over again, we will get different proportions of support (\(\overline{X}\))
    • Again, this is due to sampling variability

  • How can we figure out what we want to know: the proportion of support among the whole population?
  • The two large sample theorems—the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT)—help us understand the relationship between population parameters and sample statistics
  • As we will see next class, we can use the CLT to draw conclusions about population parameters using data from just a sample
  • Let’s define the different terms…

  • Population Parameters: population characteristics that we might be interested in knowing
    • \(E(X)\) (expectation of X): population mean of the random variable X
    • \(V(X)\) (population variance of X): population variance of the random variable X
  • Sample Statistics: what we can measure by drawing a sample of n observations from the population. (Problem: they vary from sample to sample; they contain noise)
    • \(\overline{X}\) (sample mean of X): average value of X in a particular sample
    • \(var(X)\) (sample variance X): variance of X in a particular sample

The Law of Large Numbers

As the sample size increases, the sample mean of \(X\) approximates the population mean of \(X\)

\[ \textrm{as } n \textrm{ increases, }\,\,\,\, \overline{X} = \frac{\sum^{n}_{i{=}\textrm{1}} X_i}{n} \,\,\,\,\approx \,\,\,\, E (X) \]

  • Example in the book: Proportion of support in a sample of 1 million observations is likely to be closer to the proportion of support in the whole population than the proportion of support in a sample of 1 thousand observations

The Central Limit Theorem

As the sample size increases, the standardized sample mean of \(X\) can be approximated by the standard normal distribution

\[ \textrm{as } n \textrm{ increases, }\,\,\,\, \frac{\overline{X}-E(X)}{\sqrt{V(X)/n}} \,\,\, \stackrel{\textrm{approx.}}{\sim} \,\,\, N \textrm{(0, 1)} \]

  • Example in the book: Even when the random variable we draw from is binary, if we draw repeated large samples, the standardized sample means will approximately follow a standard normal distribution

  • If we drew multiple samples of 1,000 observations from a random variable, compute the sample mean for each sample, and observed that the sample means were centred at 10 with variance 0.002, what would be your best guess for:
  • \(E(X)\) (the population mean of the random variable)?
    • Answer: \(E(X)\) \(\approx\) 10
    • Recall: mean of \(\overline{X} \approx E(X)\)
  • \(V(X)\) (the population variance of the random variable)?
    • Answer: \(V(X)\) \(\approx\) 2
    • Recall: variance of \(\overline{X}\) \(\approx\) \(V(X)\)/n; so 0.002 \(\approx\) \(V(X)\)/1,000, and thus \(V(X) \approx\) 1,000*0.002 \(\approx\) 2

Today’s Class

  • Probability

  • Events, Random Variables, and Probability Distributions

  • Population Parameters vs. Sample Statistics

  • Law of Large Numbers and Central Limit Theorem

Next Class

  • Hypothesis Testing with Coefficients (We will use CLT to determine whether an average casual effect is likely to be different than zero at the population level)