Lecture 8: Conditional Expectation

Will Horne

Merging Data in R

  • Often, when doing research, you end up with data from multiples sources (or in different files from the same source) and you need to merge them

  • Broadly you might need to

    • bind_rows - combine two datasets with the same columns covering different time periods or extra observations

    • left_join - merge new measurements/variables

  • Important to make sure your merge works as expected

    • may require harmonizing data types or renaming variables

Try it out

  • Download the data from the course site

  • Bind the rows of the unemployment data and then join it with the GDP data

    • You will likely run into error messages, look at the data and adjust as needed
  • Plot the relationship between GDP and Unemployment (any type of plot is fine)

Where are we now?

  • We have now discussed how univariate distributions generalize to multivariate distributions

    • Joint, Marginal and Conditional Distributions

    • Covariance and Correlation

  • We will discuss a very important quantity today - the conditional expectation

  • Then, will discuss how to estimate features of a population from a sample (Finally!)

Conditional Expectation

The conditional expectation of (Y|X) is: \[ \mu(\mathbf{x}) = \mathbb{E}[Y \mid \mathbf{X} = \mathbf{x}] = \begin{cases} \sum_{y} y \, P(Y = y \mid \mathbf{X} = \mathbf{x}) & \text{discrete } Y \\ \int_{-\infty}^{\infty} y \, f_{Y \mid \mathbf{X}}(y \mid \mathbf{x}) \, dy & \text{continuous } Y \end{cases} \]

  • This is the expected value of Y given X = x

  • Can be viewed as a function of x, in which case we call it the conditional expectation function (CEF)

    • The CEF tells us who the average value of Y changes for different values of X

Why does conditioning matter?

Fred is a 30 year old man. If the average life expectancy in Fred’s country is 80 years, should Fred conclude that he has 50 years (on average) of life expectancy remaining?

No! We have some good news for Fred, which is that if we let L be Fred’s expected lifespan,

\[ E[T] \lt E(T|T \geq 30) \]

Of course we can (and will) be able to get even better estimates for Fred’s lifespan if we condition on other variables (Where does he live? What does he do for work? Does he smoke?)

Two Envelope Problem

You are on a game show! Everyone knows how to solve the Monty Hall problem, so the host has changed the puzzle. There are two envelopes, one of which has twice as much money as the other. You can open one envelope, and then choose whether to switch.

You open the envelope, and it has $100 in it. Should you switch?

By the symmetry of the set up, the expectation of each envelope is equal. But, if your envelope has $100 in it, doesn’t that mean the other envelope has equal probability of having either $50 or $200? The expectation would then be $125, so you should switch! Or should you…?

Simulating Sticking or Switching

# Load required libraries
library(ggplot2)

# Simulation parameters
set.seed(30317)            # For reproducibility (my zip code)
n_simulations <- 100000   # Number of simulations

# Function to simulate one round of the game
simulate_game <- function() {
  # Randomly pick a base amount
  X <- sample(1:100, 1) * 10   # Random amount between 10 and 1000
  
  # Assign amounts to envelopes
  envelope1 <- X
  envelope2 <- 2 * X
  
  # Randomly assign which envelope is picked first
  envelopes <- sample(c(envelope1, envelope2))
  first_choice <- envelopes[1]  # Amount in the first envelope chosen
  second_choice <- envelopes[2] # Amount in the other envelope
  
  # Outcomes based on sticking vs switching
  stick_value <- first_choice
  switch_value <- second_choice
  
  return(c(stick_value, switch_value))
}

# Run the simulation
results <- replicate(n_simulations, simulate_game())

# Convert results to a data frame
results_df <- data.frame(
  Strategy = rep(c("Stick", "Switch"), each = n_simulations),
  Value = as.vector(results)
)

# Plot the results
ggplot(results_df, aes(x = Strategy, y = Value, fill = Strategy)) +
  geom_boxplot() +
  labs(title = "Simulation Results for Two Envelopes Game",
       x = "Strategy",
       y = "Amount in Envelope") +
  theme_minimal()

Simulating Sticking or Switching

Coinflip Problem

I saw an interesting example of a problem that we can solve with conditional expectation go viral last spring:

Link

What do you think?

Simulating the coin problem

results
  Alice     Bob     Tie 
0.03296 0.94830 0.01874 

Conditional Expectation Example

Gender

Support Dream Act

(Y = 1)

Oppose Dream Act

(Y = 0)

Male

(X = 1)

0.24 0.24

Female

(X = 0)

0.34 0.18
  • What is the conditional expectation of dream act support Y among men (X = 1)?

\[ E[Y|X = 1] = \sum_{y}yP(Y= y|X = 0) \\ = 0 \times P(Y=0|X=0) + 1\times P(Y = 1|X = 0)\\ = 1 \times \frac{.24}{.24 + .24} = 0.5 \]

When Y is binary, E[Y|X = x] is P(Y = 1|X=x)

CEF for binary X

  • Example

    • Y is the time repondent i waited in line to vote

    • \(X_{i} = 1\) for white voters, \(X_{i}\) = 0 for non-white voters.

  • Then the mean in each group is just a conditional expectation:

    • \(\mu(\text{white}) = E[Y_{i}|X_{i} = \text{white}]\)

    • \(\mu(\text{non-white}) = E[Y_{i}|X_{i} = \text{non-white}]\)

Visualizing the CEF

What is the CEF good for?

  • The CEF provides us with information on the relationship between variables

  • In the previous figure, we see that because \(\mu(white) \lt \mu(non-white)\), waiting times are shorter on average

  • The CEF works for relationships in the population. A sampling equivalent is coming soon!

Categorical Example of CEF

  • Of course, race is not actually binary (it may not even be categorical, but let’s assume it is for a statistics class!)

    • Y is the time respondent i waited in line to vote

    • \(X_{i} = 1\) for white voters, \(X_{i}\) = 2 for black voters, \(X_{i}\) = 3 for latino voters, \(X_{i} = 4\) for asian voters and \(X_{i} = 5\).

  • Then the mean in each group is still just a conditional expectation:

Visualizing Categorical CEF

More than one Covariate

  • We are not limited to conditioning on a single variable when calculating the CEF:

    • \(\mu(\text{white, woman}) = E[Y_{i}|X_{i} = \text{white}, Z_{i} = \text{Woman}]\)

    • \(\mu(\text{white, man}) = E[Y_{i}|X_{i} = \text{white}, Z_{i} = \text{man}]\)

    • \(\mu(\text{non-white, woman}) = E[Y_{i}|X_{i} = \text{non-white}, Z_{i} = \text{woman}]\)

    • \(\mu(\text{non-white, man}) = E[Y_{i}|X_{i} = \text{non-white}, Z_{i} = \text{man}]\)

  • Allows us to to make more nuanced comparisons, like within gender differences by race

    \[ \mu(\text{white, man}) - \mu(\text{non-white, man}) \]

Continuous CEF

  • Imagine we wanted to instead look at the CEF of (Wait Time| Income)

  • \(X_{i}\) can take on an infinite amount of values (at least in theory)

  • We are going to have to think about \(\mu\) as a function of X, since we cannot just work out each value of the CEF.

    • We are going to want to work out some estimator \(\hat{\mu}\) to cover all values of X

Wait Times and Income

Extracting the CEF at a point

Extracting the CEF at a point

Extracting the CEF at a point

CEF as a random variable…

  • The conditional expectation is a function of x: \(\mu(X) = E[Y|X = x]\)

    • Not random, for a given x, \(\mu(x)\) is just a number

    • conditional expectation, given an event X = x

  • But we can also think about the conditional expectation given a random variable E[Y|X]

    • This our best prediction of Y given that we get to know X

      . . .

…with it’s own expectation and variance

We can obtain E[Y|X] by plugging X into the CEF

\[ E[Y \mid X] = \begin{cases} \mu(0) \text{ with probability P(X = 0)} \\ \mu(1) \text{ with probability P(X = 1)} \end{cases} \]

It’s a real random variable. With a distribution, expectation \(E[E[Y|X]]\) and varaince \(V[E[Y|X]]\)

Law of Iterated Expectation

What the heck is \(E[E[Y|X]]\)? How do we work with it?

The Law of Iterated Expectation (LIE) says that

\[ \text{If } E[Y] \lt \infty \text{ for all X, } E[E[Y|X]] = E[Y] \]

In plain (mathy) language - the expectation of the conditional expectation is the marginal expectation.

And if we condition on a second random variable

\[ E[E[Y|X_{1},X_{2}]|X_{1}] = E[Y|X_{1}] \]

We are just averaging (marginalizing) over what is not constant \(X_{2}\)

LIE Example

Gender

Support Dream Act

(Y = 1)

Oppose Dream Act

(Y = 0)

Marginal of Gender

Male

(X = 1)

0.24 0.24 0.48

Female

(X = 0)

0.34 0.18 0.52

E[Y| X = 1] = 0.5 and E[Y| X = 0] = 0.65

P(X = 1) = 0.48 and P(X = 0) = 0.52

LIE Example

Use the LIE

\[ E[E[Y|X]] = E[Y|X = 0]P(X = 0) + E[Y|X = 1|X=1]P(X=1) \\ =0.5 * 0.48 + 0.65 * 0.52 = 0.58 = E[Y] \]

More on the LIE

We can find the expectation of Y by first considering E[Y|X]

E[Y | X] =

\[\begin{cases} \mu(0) \text{ with probability P(X = 0)} \\ \mu(1) \text{ with probability P(X = 1)} \end{cases}\]

Then we take the expectation of E[Y|X] which is E[E[Y|X]], by calculating the weighted average

\[ E[E[Y|X]] = E[Y|X = 0]P(X = 0) + E[Y|X = 1]P(X = 1) \\ \text{Equivalently: } \mu(0)P(x=0) + \mu({1})P(X=1) \]

A full example

Suppose a policymaker is interested in assessing the impact of a new education program on future earnings. This program, aimed at low-income students, provides additional resources, tutoring, and counseling services to improve educational outcomes. To evaluate the effect, they want to know the expected earnings of individuals who participated in the program.

Let Y represent future earnings.

Let P be an indicator of participation in the program (1 if participated, 0 if not).

Calculating Expectation

Assume 30% of individuals complete the program, and

\[ E[Y|P = 1] = $30,000 \\ E[Y|P = 0] = $40,000 \]

We can back out the overall expectation using the Law of Iterated Expectation

\[ E[Y] = $30,000 \times 0.3 + $40,000 \times 0.7 = $33,000 \]

Foreshadowing next semester: Did the program have a causal effect on student earnings? Or, what assumptions would we need to make?

Properties of the CEF

  • \(E[g(X)Y|X] = g(X)E[Y|X]\) for any function g(X)

  • If X and Y are independent RVs then

    \[ E[Y|X = x] = E[Y] \]

  • If X is independent of Y|Z, then

    \[ E[Y|X = x, Z=z] = E[Y|Z=z] \]

  • Linearity

\[ E[Y + X|Z] = E[Y|Z] + E[X|Z] \]

CEF Error

We can also write down a measure of the prediction error of the CEF

\[ \epsilon = Y - E[Y|X] \]

It has following properties

\(E[\epsilon|X] = 0\) (think about why)

\(E[\epsilon] = 0\)

We won’t cover the matrix algebra, but E[Y|X] is the projection of Y into a plane representing all functions of X, where E[Y|X] is the function of X that is closest to Y. (See theorem 9.3.9 in B&H)

Conditional Expectation as Best Predictor

  • Something we often want to do is to predict Y given some X

    • we can use any function of X, g(X) to do so
  • The mean squared error (MSE) for our predictions is

\[ E[Y - \mu(X))^{2}] \]

Best Predictor Continuted

  • We want to minimize the MSE. What g(X) does that?

    • The CEF, \(\mu(X)\) does!
  • For any g(X)

    \[ E[(Y - g(X))^{2}] \geq E[Y - \mu(X))^{2}] \]

Conditional Variance

The conditional variance of (Y|X) is defined as:

\[ \sigma^{2}(x) = Var[Y|X = x] = E[(Y - \mu(x))^{2}|X=x] \]

Variance of Y can be decomposed into

\[ Var(Y) = E[Var[Y|X]] + Var[E[Y|X]] \]

Which can be conceptualized as within group variation \(E[Var[Y|X]]\) and between group variation \(Var[E[Y|X]]\)

Height Example

Suppose we have the heights (in centimeters) of individuals drawn from three different countries:

  • Country 1: 160, 162, 158, 161

  • Country 2: 170, 168, 172, 171

  • County 3: 180, 179, 181, 182

What is \(Var[height]\)? What is the within group variation? What is the between group variation?

Skedasticity

Homoskedasticity means variances do not depend on X, such that for all X, \(\sigma^{2}(x) = \sigma^{2}\)

Unless we show that our data is homoskedastic, we should assume that it is instead heteroskedastic. This will matter for calculating error terms for regressions!