Asymptotics and Hypothesis Testing

Will Horne

Where we are

  • We now have ways to estimate various features, including the expectation and variance, of the population from a sample

  • We have all the tools we need to test hypothesis

    • Intuition is to figure out how far out in the CDF of the distribution we are
  • But - we need a way to go from sample to population estimate without assuming a specific distribution

What we know

  • For iid random variables \(X_{i},...,X_{N}\) with \(E[X_{i}]\) and \(Var[X_{i}]\) = \(\sigma^{2}\)

    • \(\bar{X}\) is unbiased, \(E[\bar{X_{n}}]\) = \(E[X_{i}]\) = \(\mu\)

    • Sampling Variance = \(\frac{\sigma^{2}}{n}\), where \(\sigma^{2}\) is \(Var[X_{I}]\)

    • Neither of these rely on a specific distribution for X

  • If \(X \sim N(\mu, \sigma^{2})\) then we have the complete distribution and hypothesis testing is straightforward.

    • But we usually donโ€™t know the distribution! What then?

Asymptotics

What can we say about the sample mean as ๐˜ฏ gets large?

Think about sequences of sample means with increasing ๐˜ฏ:

\[ \bar{๐˜Ÿ{๐Ÿฃ}} = ๐˜Ÿ_{๐Ÿฃ}\\ \bar{๐˜Ÿ_{๐Ÿค}} = (๐Ÿฃ/๐Ÿค) โ‹… (๐˜Ÿ_{๐Ÿฃ} + ๐˜Ÿ_{๐Ÿค})\\ \bar{๐˜Ÿ{๐Ÿฅ}} = (๐Ÿฃ/๐Ÿฅ) โ‹… (๐˜Ÿ{1} + ๐˜Ÿ{๐Ÿค} + ๐˜Ÿ_{๐Ÿฅ})\\ โ‹ฎ\\ \bar{๐˜Ÿ{๐˜ฏ}} = (๐Ÿฃ/๐˜ฏ) โ‹… (X_{1} + X_{2} + X_{3} + ... + X_{n}) \]

Asymptotics and Limits

Asymptotic analysis allows us to make approximations about finite sample properties

Definition

A sequence \(\{a_{n}: n = 1, 2, 3,...\}\) has the limit a (\(a_{n} \rightarrow a\)) as \(n \rightarrow \infty\) if for all \(\delta > 0\) there is some \(n_{\delta} < \infty\) such that for all \(n \geq n_{\delta}, |a_{n} - a| \leq \delta\)

This should look familiar! As n gets larger, \(a_{n}\) gets closer and closer to a.

This is called convergence

The sequence is bounded if there is a \(b < \infty\) such that \(|a_{n}| \leq \delta\)

Visualizing Convergence

Definition

A sequence \(\{a_{n}: n = 1, 2, 3,...\}\) has the limit a (\(a_{n} \rightarrow a\)) as \(n \rightarrow \infty\) if for all \(\delta > 0\) there is some \(n_{\delta} < \infty\) such that for all \(n \geq n_{\delta}, |a_{n} - a| \leq \delta\)

Making \(\delta\) Smaller

Definition

A sequence \(\{a_{n}: n = 1, 2, 3,...\}\) has the limit a (\(a_{n} \rightarrow a\)) as \(n \rightarrow \infty\) if for all \(\delta > 0\) there is some \(n_{\delta} < \infty\) such that for all \(n \geq n_{\delta}, |a_{n} - a| \leq \delta\)

Convergence in Probability

Definition

A sequence of random variables, \(\{Z_{n}:n, n = 1,2,...\}\) converges in probability to a value b if for every \(\epsilon > 0\)

\[P(|Z_{n} - b| > \epsilon \rightarrow 0)\]

as \(n \rightarrow \infty\). Written \(Z_{n} \overset{p}{\rightarrow} b\)

  • Probability that \(Z_{n}\) lies outside an arbitrarily small (the smallest interval you can imagine!) interval around b approaches 0 as \(n \rightarrow \infty\)

    • Also written plim(\(Z_{n}\))

Consistency

Definition

An estimator is consistent if \(\widehat{\theta}_{n} \overset{p}{\rightarrow} \theta\)

  • Distribution of \(\widehat{\theta}_{n}\) collapses on \(\theta\) as \(n \rightarrow \infty\)

  • Inconsistent estimators are really bad. We donโ€™t use them!

    • As we get more data, our estimates do not improve, and may get worse
  • Unbiased means the expectation of the estimator is the parameter. Consistent means we get closer to the parameter as we get more data.

Graphic Plim(X)

Understanding Check: The โ€œfirst observationโ€ estimator is unbiased. Is it consistent?

Law of Large Numbers

Law of Large Numbers

Let \(X_{1},...,X_{n}\) be n iid draws from a distribution with mean \(E[|X_{i}|] < \infty\). Let \(\bar{X_{n}}= \frac{1}{n}\sum_{i = 1}^{n}X_{i}\). Then \(\bar{X}_{n} \overset{p}{\rightarrow} E[X_{i}]\).

  • The probability of \(\bar{X}_{n}\) being far away from \(\mu\) goes to 0 as n gets big.

    • Issue: How large of an n is big enough?
  • Implies that our plug-in estimators (Sample mean and variance, etc) are consistent

    • If \(E[|g(X_{i})|] < \infty\), then \(\frac{1}{n}\sum_{i = 1}^{n}g(X_{i}) \overset{p}{\rightarrow} E[g(X_{i})]\)

Simulating the LLN

# Set parameters
nsims <- 10000  # Number of simulations

# Preallocate a matrix to hold the sample means for each sample size
sample_means <- matrix(NA, nrow = nsims, ncol = 6)

# Perform simulations
for (i in 1:nsims) {
  s5 <- rexp(n = 5, rate = 0.5)
  s15 <- rexp(n = 15, rate = 0.5)
  s30 <- rexp(n = 30, rate = 0.5)
  s100 <- rexp(n = 100, rate = 0.5)
  s1000 <- rexp(n = 1000, rate = 0.5)
  s10000 <- rexp(n = 10000, rate = 0.5)
  
  sample_means[i, 1] <- mean(s5)
  sample_means[i, 2] <- mean(s15)
  sample_means[i, 3] <- mean(s30)
  sample_means[i, 4] <- mean(s100)
  sample_means[i, 5] <- mean(s1000)
  sample_means[i, 6] <- mean(s10000)
}

# Convert holder matrix to data frame for easier plotting
sample_means <- data.frame(n5 = sample_means[, 1], n15 = sample_means[, 2], n30 = sample_means[, 3],
                        n100 = sample_means[, 4], n1000 = sample_means[, 5], n10000 = sample_means[, 6])

Visualizing LLN (n = 5)

Visualizing LLN (N = 15)

Visualizing LLN (N=30)

Visualizing LLN (N = 100)

Visualizing LLN (N = 1000)

Visualizing LLN (N=100000)

Chebyshev Inequality

How can we work out convergence in probability for an arbitrary distribution?

Chebyshev Inequality

Suppose that X is a random variable with finite Variance. Then, for every real number \(\delta > 0\),

\[P(|X - E[X]| > \delta) \leq \frac{Var[X]}{\delta^{2}}\]

The intuition is that variance places a limit on how far an observation can be from itโ€™s mean.

Proof of Chebyshev

Let Z = X - E[X]. Then we have:

\[P(|Z| \geq \delta) = P(Z^{2} \geq \delta^{2}) \leq \frac{E[X - E[x]]^{2}} {\delta^{2}} = \frac{Var[X]}{\delta^{2}} \]

Given finite variance, applying this to \(Z = \bar{X_{n}} - \mu\) proves the LLN

Gerber et al Example

Let \(T_{i}\) = 1 if you are in the neighbors group and 0 if in the control. (ignore the other groups).

Our estimator is the difference-in-means estimator

\[ \widehat{\tau_{n}} = \frac{\sum_{i=1}^{n}Y_{i}T_{i}}{\sum_{i = 1}^{n}T_{i}} - \frac{\sum_{i=1}^{n}Y_{i}(1-T_{i})}{\sum_{i = 1}^{n}(1-T_{i})} \]

Can we show that this estimator converges in probability to the population difference in means parameter, \(\tau\)

Proving Consistency

Letโ€™s focus on the sample means for the treated units, \(T = 1\)

\[ \frac{\sum_{i=1}^{n}Y_{i}T_{i}}{\sum_{i = 1}^{n}D_{i}} = \frac{\frac{1}{n}\sum_{i=1}^{n}Y_{i}D_{i}}{\frac{1}{N}\sum_{i=1}^{n}D_{i}} \overset{p}\rightarrow E[Y_{i}|D_{i} = 1] \]

Consistent vs Biased

  • The Chebyshev inequality also tells us that unbiased estimators are consistent, if \(Var[\widehat{\theta}_{n}] \rightarrow 0\)

  • Recall that the first observation estimator \(\widehat{\theta}^{f}_{n}\) is inconsistent

    • Unbiased, because \(E[\widehat{\theta}_{n}^{f}] = E[X_{1}] = \mu\)

    • But itโ€™s a constant! Itโ€™s variance never shrinks

    • More data doesnโ€™t improve our estimate

  • Possible to have a consistent, but biased esimator

    • If we estimate the sample mean replacing n with n-1 in the calculation of the mean.

Where we are

  • We are so close to hypothesis testing now! For iid rvs with \(E[X_{i}] = \mu\) and \(Var[X_{i}] = \sigma^{2}\)

    • \(E[\bar{X}_{n}] = \mu\) and \(Var[X_{n}] = \frac{\sigma^{2}}{n}\)

    • \(\bar{X}\) converges to \(\mu\) as n gets really really big

    • Chebyshev letโ€™s us bound the probabilities

  • Last step

    • How can we approximate \(P(a \leq \bar{X}_{n} < b)\)?

    • What distribution will that take on?

Convergence in Distribution

Convergence in Distribution

Let \(Z_{1}, Z_{2}...,Z_{n}\) be a sequence of rvs and let \(F_{n}(u)\) be the cdf of \(Z_{n}\). Then we can say that the sequence converges in distribution to rv W with cdf \(F_{W}(u)\) if

\[ \lim_{n \to \infty} F_{n}(u) = F_{w}(u) \]

When n is really really large, the distribution of \(Z_{n}\) is very very similar to that of W.

You may see this referred to as the asymptotic distribution or the large sample distribution

Key Point: If \(X_{n}\overset{p}{\to} X\), then \(X_{n} \overset{d}{\to} X\)

Central Limit Theorem

Central Limit Theorem (CLT)

Let \(X_{1},....,X_{n}\) be iid rvs from a distribution with mean \(\mu = E[X_{i}]\) and variance \(\sigma^{2} = Var[X_{i}]\) . If \(E[X_{i}^{2} < \infty]\) we have

\[\sqrt{n}(\bar{X} - \mu) \overset{d}{\to} N(0,\sigma^{2})\]

This results also gives us the following approximation

\[ \bar{X}_{n} \overset{a}{\sim} N(\mu, \sigma^{2}) \]

So as n gets really really big, the sample mean is distributed approximately normal around the population mean, with variance \(\sigma^{2}\)

Plotting the CLT

Plotting the CLT

Plotting the CLT

Plotting the CLT