POST 8000: Conditional Probability and Random Variables

Will Horne

Admin Stuff (1)

  • Problem Set will be posted online EOD Wednesday 9/18

    • Due Sunday 9/29 (11:59:59). Upload to blackboard

    • Theoretical and Analytical Questions can be word or PDF

    • Please also submit code as an .rmd or .qmd.

    • Please Please Please - don’t use AI beyond what is specified in syllabus

    • I am willing to review code once. Will not give full answers, but will point in right direction.

Admin Stuff (2)

  • Final Paper:

    • Topic of your choosing

      • Must be publicly available data for you to analyze
    • Analysis Plan + Data Analysis

    • Topic by 10/7 - 1 page memo identifying data source + importance of project

    • Must schedule a meeting with me by 10/7 to discuss and make sure of feasability

  • Take Home Mid-Term week of 10/14 (No class, Holiday)

Conditional Probability

Conditional probability is the heart of modern statistics and social science. IMO - getting a firm grasp on conditional probability is the most important thing we do this semester!

Unconditional Probability: What is the probability that A occurs?

Conditional Probability: If we know B has occurred, what is the probability that A occurs (not necessarily sequential)?

We condition our estimate of A on B having occurred.

Could spend a whole semester on conditional probability and fun examples.

Drawing Cards

  • Consider a standard 52 card deck. Let A be the event of drawing a Spade and B be the event of drawing a red card.

  • What is P(A)?

  • What is P(B)?

  • What is P(A|B)?

  • What is P(B|A)?

Social Science Examples

  • What is the probability of two states going to war if they are both democracies?

  • What is the probability of a recession in 2025 if the unemployment rate rose in 2024?

  • What is the probability of a military coup in Brazil if consumer prices double?

  • Examples of relevance for you?

Definition

If \(P(B)\) > 0, then we define the conditional probability of A given B as:

\[ P(A|B) = \frac{P(A \cap B)}{P(B)} \]

How often A and B jointly occur, divided by how often B occurs. Why do we need to divide by B?

Example

What do we think \(P(\text{Policy Nerd} | \text{POST PhD Student})\) is?

What about \(P(\text{POST PhD Student}|\text{Policy Nerd})\)?

Often assumed that \(P(Post PhD Student|Policy Nerd)\) should also be high. This is referred to as the base rate fallacy. With cohorts under 10, simply cannot be high!

Intuition

Weather

Let A = Presence of clouds and B = Rain

What is \(P(A|B)\)? (Don’t over-think it - we don’t need math here!)

Does that mean \(P(B|A)\) is 1 too?

What about \(P(B|A^{c})\)?

Senate example

\[\begin{array}{|c|c|c|c|c|} \hline & \textbf{Democrats} & \textbf{Republicans} & \textbf{Independents} & \textbf{Total} \\ \hline \textbf{Men} & 33 & 40 & 2 & 75 \\ \hline \textbf{Women} & 15 & 9 & 1 & 25 \\ \hline \textbf{Total} & 48 & 49 & 3 & 100 \\ \hline \end{array}\]

Choose one senator at random from this population. What is the probability a randomly selected Democrat is a woman?

What is the probability that a randomly selected woman is a Republican?

“Fun” Interactive

Conditional Probabilities are Probabilities

  • Conditional Probabilities are valid probability functions

  • All the Axioms of probability are satisfied

  • P(A|A) = 1

  • But, why isn’t this true?\[P(A|B \cup C) = P(A|B) + P(A|C)\]

Joint Probabilities

  • The probability of the intersection of two events

    • Either written \(P(A\cap B)\) or \(P(A,B)\)
  • If we think through conditional prob definition, it implies\[P(A,B) = P(A)P(B|A) = P(B)P(A|B)\]

  • We can generalize to joint probability for arbitrarily many events\[P(A_{1},...,A_{n}) \\=P(A_{1})P(A_{2}|A_{1})P(A_{3}|A1,A2)...P(A_{n}|P(A_{1}...A_{n-1})\]

Law of Total Probability (LoTP)

  • You may have heard there is an election in 2024. Suppose we know the proportion of Trump and Harris supporters in each city in Georgia.

  • How can we use this information to work out state wide support for each candidate?

    • All of the cities together make up a partition of the state

    • In technical terms, a partition is a set of mutual disjoint events whose union make up the sample space.

Formalizing LoTP

  • The law of total probability says that if \(A_{1}, ... , A_{k}\) is a partition\[P(B) = \sum_{j = 1}^{k}P(B|A_{j})P(A_{j})\]

  • In practical terms, what does this mean?

  • How do we use thus to work out the probability of a random Georgia voter supporting Harris?

Simple Georgia

  • Imagine Georgia has 3 cities, Atlanta, Helen, and Macon.

  • P(Harris|Atlanta) = 0.60, P(Harris|Helen) = 0.1, P(Harris|Macon) = 0.15.

    • Looks kind of bad for Harris!
  • But, we need to consider populations. Atlanta has 500,000 voters, Macon has 80,000 voters and Helen has 20,000 voters.

    • So, P(Atlanta) = .83, P(Macon = .13), P(Helen = .04)
  • How do we put this all together with LoTP?

Elder Girl Example

The Smith’s have two children. The older child is a girl. What is the probability that both children are girls (assume gender/sex is binary here for simplification, and birth rates are equal for both sexes)?

1/2 - why?

The Smith’s have two children. At least one of them is a boy. What is the probability that both children are boys?

1/3 - why?

Monty Hall

  • Possibly the most famous conditional probability problem.

    • Misnamed - arises from a letter to a columnist, Marilyn vos Savant, rather than from Monty Hall’s Show
  • Imagine you are a contestant on a game show. You can choose between three doors, and receive the prize behind the door

  • Two doors have a goat behind them, one has a car. Monty knows where the car is, and opens one door such that he never reveals a car. You then have the option to switch doors

  • If you want to win a car, what should you do?

Bayes Rule

Imagine you are a public health regulator. A pharmaceutical rep comes to you with a fantastic new cancer screen test, that can detect a deadly cancer early 99 % of the time.

What’s more, it has a low false positive rate, only 3 %.

The manufacturers want you to approve the screening test and recommend regular screenings for the public. What should you do?

Many (Most?) people would say go ahead and approve it, but this ignores the base rate fallacy

Bayes Rule Intuition

  • Fortunately, most people do not have cancer at any given point in time

    • Specific cancers are even rarer
  • Imagine that the population prevalence of the specific cancer is 1 in 1,000 at any given point in time.

    • This would make it an extremely common cancer, 350,000 US cases a year
  • Imagine we give a random person the cancer screening, and it comes back positive. What are the odds that they actually have the disease?

Bayes Rule

  • Reverend Thomas Bayes (1701-1761): English Minister and Statistician

    • Entire branch of statistics, Bayesian Statistics, is named after him!
  • Bayes Rule: if \(P(B) > 0\) then \[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]

Expand Bayes Rule

  • We can expand this out to
    \[P(A|B) = \frac{P(B|A)P(A)}{P(B|A)P(A) + P(B|A^{c})P(A^{c})}\]

  • Denominator follows from LoTP (do you see why?)

  • We call the resulting P(A|B) our Posterior Probability

Back to our example

  • So, what are the odds of the patient actually having cancer

  • We have some prior information

    • P(Cancer) = .001

    • P(Positive|Cancer) = 0.99

    • P(Positive|No Cancer) = 0.03

  • Apply the formula! Public Health decisions are complicated!

Prosecutors Fallacy

Imagine a murder is committed, and analyses of the crime scene shows that the murderer has a rare blood type shared by only 5% of the population.

Imagine a suspect is given a blood test - and is shown to share that blood type. The prosecutor argues that this establishes a 95% chance that the suspect is the murderer.

If you were on the jury - how would you evaluate that piece of evidence?

Applying Bayes Rule to Paul

  • Recall that we determined that the probability of Paul selecting all 8 games correctly was .0025

  • But…what do we think the base rate of prophetic Octupi is?

    • Even if we let it be incredibly unlikely, rather than 0, Paul is most likely not a prophet.
  • What are some other possible uses of Bayes Rule?

Simpson’s Paradox

Consider two doctors, Dr. Nick and Dr. Hibbert who both operate in Springfield. They each offer two types of surgeries: Heart Surgery and Band-Aid Removal. Each surgery can be a success or a failure.

Dr. Hibbert Heart Band-Aid
Success 70 10
Failure 20 0
Dr. Nick Heart Band-Aid
Success 2 81
Failure 8 9

Dr. Nick has a higher success rate (83%) than Dr. Hibbert (80%). Which doctor would you prefer to use?

Independence

  • Bayes Rule tells us how knowing B changes the probability of A.

  • Sometimes, knowing B tells us nothing about A!

  • Formally, two events A and B are independent (or A \(\perp\) B) if \(P(A \cap B)\) = \(P(A)P(B)\). Why??

    • Why is \(\perp\) an intuitive symbol for independence?

    • Independence is symmetric (A \(\perp\) B implies B \(\perp\) A)

    • If events are not independent, they are dependent

An important consequence

if \(A \perp B\) and P(B) > 0, then:

\[ P(A|B) = \frac{P(A \cap B)}{P(B)} \]

from the definition of conditional probability, and then…

\[ = \frac{P(A)P(B)}{P(B)} \]

And if we do a little algebra…

\[= P(A)\]

$$

$$

10 Coins Example (Sequential)

Imagine your friend has 10 fair coins she will flip one after the other, showing you the result each time.

She flips each of the first 9 coins. Each time, it comes up heads.

What is the probability that the 10th coin will come up heads? Why?

10 Coins Example (Non-Sequential)

Instead, imagine your friend has 10 fair coins that she flips, and then chooses 9 coins to show you.

After flipping all the coins, she reveals 9 heads, reserving one already flipped coin.

What is the probability that the 10th coin is also heads? Why?

Quick Question

Are disjoint events independent - dependent - or does it depend?

Independence and Sampling

Suppose that the current prevalence of COIVD in South Carolina is 1.5%.

If we sample 20 random people, what is the likelihood that at least one has COVID?

What if the first person we sample has COVID - what is the probability at least one of the remaining people has Covid?

Note - sampling without replacement is independent. Sampling with replacement adds dependency.

Conditional Independence

  • Two events, A and B are conditionally independent given C if \[P(A\cap B|E) = P(A|E)P(B|E)\]

  • This is a very important concept once we get to regression

  • Independence does not imply conditional independence

    • Can we think of examples?

Conditional Independence and College Admissions

Consider undergrad admission to a prestigious public university in South Carolina that happens to have a good football team. Assume that in the population high school GPA and football talent are independent in the general population.

This public university wants both good students and good athletes, so they value both GPA and football talent in their admissions process. Among admitted students, would we expect GPA and football talent to be independent?

Conditional Independence and Collider Bias

  • Also closely related to collider bias

  • Consider a regression model (University GPA ~ SAT Score), showing no association between GPA and SAT scores

    • We are conditioning on acceptance to university.

Graphing Collider Bias

Does height make NBA players worse shooters?

Random Variables

  • First of all - I did not name these!

  • Random variables provide a link between probability and data

  • A Random Variable is a function that maps from the sample space to the real number line.

    • A numeric representation of uncertain events

    • Imagine an event A - the numeric value of that event is then X(A) where X is a random variable

  • Randomness comes from the randomness of the “experiment” or event - not from X

An Example - Polling

  • Each poll is an event where X(poll) is % support for Harris.

  • Imagine polls coming from a distribution centered around the true support.

  • In practice, we don’t know the true level of support. X is then the sample mean of support for Harris in each poll.

    • This is the core of what FiveThirtyEight and similar are doing for their election forecasts.

      • What true level of support is most likely given the observed polling data (plus some other factors).

Random Variables vs Outcomes

  • For any given ‘experiment’, there can be many different random variables

  • Imagine randomly sampling university students, where we measure their class (Freshman, Sophomore, Junior, Senior)

    • with 2 students, there are 8 possible outcomes (FF, FSo, FJ, Fse, SoSo, SoJ, SoSe, JJ, JSe, SeSe)

    • Random Variable could be number of Freshmen

    • Or..number of Juniors + Senrors

    • Or..number of non-Seniors

Types of Random Variables

  • Discrete and Continuous

    • Today + Next Week, Discrete

    • Closely related, but different techniques and tests (ie Logit vs OLS vs Poisson Regressions) based on the type of RV

  • Definition: A Random Variable is discrete if the values it takes with positive probability is finite or countably infinite

    • Includes binary variables (0/1), categorical (ordered or unordered) variables, as well as count variables

What’s Random Here?

  • Uncertainty over the sample space –> uncertainty over the value of X

  • The distribution of a random variable specifies that uncertainty

    • Specifically, it gives you the probabilities of all possible events

    • X = number of days a randomly chosen student was absent from school

    • Distribution tells you - What is P( X >10)? What is P (X = 0)

Simple Example

Consider flipping 4 fair coins, where X is the number of heads.

What is P(X = 1)?

What about P(X = 2 or 3)?

What about P(X > 4)?

Distribution of X

Terminology (Probability Mass Function)

  • The Probability Mass Function (PMF) is specified as:\[p_{x}(x) = P(X = x)\]

  • X = x is an event

  • The support of X for a discrete random variable is the values for which it has a positive probability

  • What does all this mean in plain language?

Characteristics of a Valid PMF

  • A valid PMF with support \(x_{1}, x_{2},...\) has the following properties

    • Non-Negativity: \(p_{x}(x) > 0\) if \(x \in x_{1}, x_{2}, ...\) and \(p_{x}(x) = 0\) otherwise

    • Sums to 1: \(\sum_{j = 1}^{n} p_{x}(x_{j}) = 1\)

    • The probability of any set of values S in (\(x_{1}, x_{2}...)\) is \[P(X\in S) = \sum_{x \in S} p_{x}(x)\]

PMF Examples

Imagine you are designing an experimental educational policy intervention with 3 treatment conditions (Control, T1 and T2).

Good randomization is the foundation of experimental inference, so you decide to randomize into conditions by flipping four coins. Let X be the number of heads. If X is 0 or 1 you assign the school to the control. If X is 2 you assign it to T1, and if X >2, you assign it to T2.

What does the PMF of X look like? Would you use this randomization technique?