Asymptotics and Hypothesis Testing

Will Horne

Where we are

We now have ways to estimate various features, including the expectation and variance, of the population from a sample
We have all the tools we need to test hypothesis
- Intuition is to figure out how far out in the CDF of the distribution we are
But - we need a way to go from sample to population estimate without assuming a specific distribution

What we know

For iid random variables \(X_{i},...,X_{N}\) with \(E[X_{i}]\) and \(Var[X_{i}]\) = \(\sigma^{2}\)
- \(\bar{X}\) is unbiased, \(E[\bar{X_{n}}]\) = \(E[X_{i}]\) = \(\mu\)
- Sampling Variance = \(\frac{\sigma^{2}}{n}\), where \(\sigma^{2}\) is \(Var[X_{I}]\)
- Neither of these rely on a specific distribution for X
If \(X \sim N(\mu, \sigma^{2})\) then we have the complete distribution and hypothesis testing is straightforward.
- But we usually don’t know the distribution! What then?

Asymptotics

What can we say about the sample mean as 𝘯 gets large?

Think about sequences of sample means with increasing 𝘯:

\[ \bar{𝘟{𝟣}} = 𝘟_{𝟣}\\ \bar{𝘟_{𝟤}} = (𝟣/𝟤) ⋅ (𝘟_{𝟣} + 𝘟_{𝟤})\\ \bar{𝘟{𝟥}} = (𝟣/𝟥) ⋅ (𝘟{1} + 𝘟{𝟤} + 𝘟_{𝟥})\\ ⋮\\ \bar{𝘟{𝘯}} = (𝟣/𝘯) ⋅ (X_{1} + X_{2} + X_{3} + ... + X_{n}) \]

Asymptotics and Limits

Asymptotic analysis allows us to make approximations about finite sample properties

Definition

A sequence \(\{a_{n}: n = 1, 2, 3,...\}\) has the limit a (\(a_{n} \rightarrow a\)) as \(n \rightarrow \infty\) if for all \(\delta > 0\) there is some \(n_{\delta} < \infty\) such that for all \(n \geq n_{\delta}, |a_{n} - a| \leq \delta\)

This should look familiar! As n gets larger, \(a_{n}\) gets closer and closer to a.

This is called convergence

The sequence is bounded if there is a \(b < \infty\) such that \(|a_{n}| \leq \delta\)

Visualizing Convergence

Definition

Making \(\delta\) Smaller

Definition

Convergence in Probability

Definition

A sequence of random variables, \(\{Z_{n}:n, n = 1,2,...\}\) converges in probability to a value b if for every \(\epsilon > 0\)

\[P(|Z_{n} - b| > \epsilon \rightarrow 0)\]

as \(n \rightarrow \infty\). Written \(Z_{n} \overset{p}{\rightarrow} b\)

Probability that \(Z_{n}\) lies outside an arbitrarily small (the smallest interval you can imagine!) interval around b approaches 0 as \(n \rightarrow \infty\)
- Also written plim(\(Z_{n}\))

Consistency

Definition

An estimator is consistent if \(\widehat{\theta}_{n} \overset{p}{\rightarrow} \theta\)

Distribution of \(\widehat{\theta}_{n}\) collapses on \(\theta\) as \(n \rightarrow \infty\)
Inconsistent estimators are really bad. We don’t use them!
- As we get more data, our estimates do not improve, and may get worse
Unbiased means the expectation of the estimator is the parameter. Consistent means we get closer to the parameter as we get more data.

Graphic Plim(X)

Understanding Check: The “first observation” estimator is unbiased. Is it consistent?

Law of Large Numbers

Law of Large Numbers

Let \(X_{1},...,X_{n}\) be n iid draws from a distribution with mean \(E[|X_{i}|] < \infty\). Let \(\bar{X_{n}}= \frac{1}{n}\sum_{i = 1}^{n}X_{i}\). Then \(\bar{X}_{n} \overset{p}{\rightarrow} E[X_{i}]\).

The probability of \(\bar{X}_{n}\) being far away from \(\mu\) goes to 0 as n gets big.
- Issue: How large of an n is big enough?
Implies that our plug-in estimators (Sample mean and variance, etc) are consistent
- If \(E[|g(X_{i})|] < \infty\), then \(\frac{1}{n}\sum_{i = 1}^{n}g(X_{i}) \overset{p}{\rightarrow} E[g(X_{i})]\)

Simulating the LLN

# Set parameters
nsims <- 10000  # Number of simulations

# Preallocate a matrix to hold the sample means for each sample size
sample_means <- matrix(NA, nrow = nsims, ncol = 6)

# Perform simulations
for (i in 1:nsims) {
  s5 <- rexp(n = 5, rate = 0.5)
  s15 <- rexp(n = 15, rate = 0.5)
  s30 <- rexp(n = 30, rate = 0.5)
  s100 <- rexp(n = 100, rate = 0.5)
  s1000 <- rexp(n = 1000, rate = 0.5)
  s10000 <- rexp(n = 10000, rate = 0.5)
  
  sample_means[i, 1] <- mean(s5)
  sample_means[i, 2] <- mean(s15)
  sample_means[i, 3] <- mean(s30)
  sample_means[i, 4] <- mean(s100)
  sample_means[i, 5] <- mean(s1000)
  sample_means[i, 6] <- mean(s10000)
}

# Convert holder matrix to data frame for easier plotting
sample_means <- data.frame(n5 = sample_means[, 1], n15 = sample_means[, 2], n30 = sample_means[, 3],
                        n100 = sample_means[, 4], n1000 = sample_means[, 5], n10000 = sample_means[, 6])

Visualizing LLN (n = 5)

Visualizing LLN (N = 15)

Visualizing LLN (N=30)

Visualizing LLN (N = 100)

Visualizing LLN (N = 1000)

Visualizing LLN (N=100000)

Chebyshev Inequality

How can we work out convergence in probability for an arbitrary distribution?

Chebyshev Inequality

Suppose that X is a random variable with finite Variance. Then, for every real number \(\delta > 0\),

\[P(|X - E[X]| > \delta) \leq \frac{Var[X]}{\delta^{2}}\]

The intuition is that variance places a limit on how far an observation can be from it’s mean.

Proof of Chebyshev

Let Z = X - E[X]. Then we have:

\[P(|Z| \geq \delta) = P(Z^{2} \geq \delta^{2}) \leq \frac{E[X - E[x]]^{2}} {\delta^{2}} = \frac{Var[X]}{\delta^{2}} \]

Given finite variance, applying this to \(Z = \bar{X_{n}} - \mu\) proves the LLN

Gerber et al Example

Let \(T_{i}\) = 1 if you are in the neighbors group and 0 if in the control. (ignore the other groups).

Our estimator is the difference-in-means estimator

\[ \widehat{\tau_{n}} = \frac{\sum_{i=1}^{n}Y_{i}T_{i}}{\sum_{i = 1}^{n}T_{i}} - \frac{\sum_{i=1}^{n}Y_{i}(1-T_{i})}{\sum_{i = 1}^{n}(1-T_{i})} \]

Can we show that this estimator converges in probability to the population difference in means parameter, \(\tau\)

Proving Consistency

Let’s focus on the sample means for the treated units, \(T = 1\)

\[ \frac{\sum_{i=1}^{n}Y_{i}T_{i}}{\sum_{i = 1}^{n}D_{i}} = \frac{\frac{1}{n}\sum_{i=1}^{n}Y_{i}D_{i}}{\frac{1}{N}\sum_{i=1}^{n}D_{i}} \overset{p}\rightarrow E[Y_{i}|D_{i} = 1] \]

Consistent vs Biased

The Chebyshev inequality also tells us that unbiased estimators are consistent, if \(Var[\widehat{\theta}_{n}] \rightarrow 0\)
Recall that the first observation estimator \(\widehat{\theta}^{f}_{n}\) is inconsistent
- Unbiased, because \(E[\widehat{\theta}_{n}^{f}] = E[X_{1}] = \mu\)
- But it’s a constant! It’s variance never shrinks
- More data doesn’t improve our estimate
Possible to have a consistent, but biased esimator
- If we estimate the sample mean replacing n with n-1 in the calculation of the mean.

Where we are

We are so close to hypothesis testing now! For iid rvs with \(E[X_{i}] = \mu\) and \(Var[X_{i}] = \sigma^{2}\)
- \(E[\bar{X}_{n}] = \mu\) and \(Var[X_{n}] = \frac{\sigma^{2}}{n}\)
- \(\bar{X}\) converges to \(\mu\) as n gets really really big
- Chebyshev let’s us bound the probabilities
Last step
- How can we approximate \(P(a \leq \bar{X}_{n} < b)\)?
- What distribution will that take on?

Convergence in Distribution

Convergence in Distribution

Let \(Z_{1}, Z_{2}...,Z_{n}\) be a sequence of rvs and let \(F_{n}(u)\) be the cdf of \(Z_{n}\). Then we can say that the sequence converges in distribution to rv W with cdf \(F_{W}(u)\) if

\[ \lim_{n \to \infty} F_{n}(u) = F_{w}(u) \]

When n is really really large, the distribution of \(Z_{n}\) is very very similar to that of W.

You may see this referred to as the asymptotic distribution or the large sample distribution

Key Point: If \(X_{n}\overset{p}{\to} X\), then \(X_{n} \overset{d}{\to} X\)

Central Limit Theorem

Central Limit Theorem (CLT)

Let \(X_{1},....,X_{n}\) be iid rvs from a distribution with mean \(\mu = E[X_{i}]\) and variance \(\sigma^{2} = Var[X_{i}]\) . If \(E[X_{i}^{2} < \infty]\) we have

\[\sqrt{n}(\bar{X} - \mu) \overset{d}{\to} N(0,\sigma^{2})\]

This results also gives us the following approximation

\[ \bar{X}_{n} \overset{a}{\sim} N(\mu, \sigma^{2}) \]

So as n gets really really big, the sample mean is distributed approximately normal around the population mean, with variance \(\sigma^{2}\)

Asymptotics and Hypothesis Testing

Where we are

What we know

Asymptotics

Asymptotics and Limits

Visualizing Convergence

Making \(\delta\) Smaller

Convergence in Probability

Consistency

Graphic Plim(X)

Law of Large Numbers

Simulating the LLN

Visualizing LLN (n = 5)

Visualizing LLN (N = 15)

Visualizing LLN (N=30)

Visualizing LLN (N = 100)

Visualizing LLN (N = 1000)

Visualizing LLN (N=100000)

Chebyshev Inequality

Proof of Chebyshev

Gerber et al Example

Proving Consistency

Consistent vs Biased

Where we are

Convergence in Distribution

Central Limit Theorem

Plotting the CLT

Plotting the CLT

Plotting the CLT

Plotting the CLT