Lecture 6: Measures of Spread and Continuous Random Variables

Will Horne

Measures of Spread

So far, we have focused on how to calculate the probability of events, and the expected value, given a distribution! The Expectation is a nice summary statistic, but without a measure of spread it is incomplete.

Think about the binomial and the hypergeometric (sampling with and without replacement). The expectation is the same, but the distributions sometimes look very different!

Same Expectation, Different Spread

Motivation

We don’t just want to know what the center of gravity of the data is, we also want to know how spread it out it is

In practical terms, the wider the spread of the data, and the fatter the tails, the less surprised we should be by values far from the expectation

In the second half of the semester, this will be crucial for hypothesis testing. The basic intuition of hypothesis testing is that we measure how surprised we should be by a value if there is not a relationship between our independent and dependent variables.

Variance

  • We would like a metric that tells us how far from E[X] the values of X typically fall.

  • That metric is called variance

    • If the variance is small, we expect the realizations of X to cluster around E[X], and we would be surprised to see values far from E[X]

    • If the variance is large, we expect the realizations of X to be quite spread out, and we would not be surprised to see values far from E[X]

Defining Variance

The Variance, which measures the spread of the distribution, is defined as:

\[ Var[X] = E[(X -E[X])^{2}] \]

Why not just use E[X - E[X]]?

In practice, this is a weighted average of distance from the mean

A common representation of the variance is

\[ Var[X] = E[X^{2}] - (E[X])^{2} \]

Standard Deviation

\[ Var[X] = E[(X -E[X])^{2}] \]

Note that the variance is the squared distance from the mean. If we want to know, on average, how far a realization of X will be from the \(E[X]\), we can calculate the standard deviation

\[ SD(X) = \sqrt{Var[X]} \]

In practice, we usually work with variance rather than standard deviation, but standard deviation is more immediately interpretable

Example: Weighted Dice

Calculate the Expectation, Variance and Standard Deviation for a weighted dice roll with the following PMF:

x P(X = x)
1 0.1
2 0.15
3 0.2
4 0.25
5 0.2
6 0.1

Another Variance Example

Imagine we want to know both the average (expectation) and spread (variance/standard deviation) of incomes in a community. We randomly select 10 households who have the following incomes (in thousands of US Dollars). Let X be the income of each respondent. We have:

\[ X = [45,50,52,47,60,55,120,28,430,73] \]

How can we calculate the expectation, variance and standard deviation?

Properties of Variance

  • Var[X + c] = Var[X] for any constant

  • If c is a constant, \(Var[cX] = c^{2}Var[X]\)

  • If and only if X and Y are independent, Var[X + Y] = Var[X] + Var[Y]

  • If X is not a constant, Var[X] > 0.

Binomial Variance

Rather than use the formula, we can also use story proofs to find the variance of known distributions

Recall that X ~ Bin(n,p) is the sum of n Bernouli Trials

Variance of a Bernoulli is easy

\[ Var[X_{i}] = E[X_{i}^{2}] - E[X_{i}]^{2} = p - p^{2} = p(1-p) \]

Binomials are the sum of independent Bernoulli trials

\[ Var[X] = n(p - p^{2}) = np(1-p) \]

Taking Stock

  • So far: Probability Theory and Discrete Random Variables

    • You should feel comfortable about distributions of data that take on discrete values

    • You should have a good idea of what the PMF and CDF of discrete variables mean

  • Now: Same idea for variables that can take on any value

    • Many variables we care about as social scientists are (approximately) continuous

      • Income, Time, Tax Rates, Vote Shares

      • Sample means of variables are also continuous

Continuous Random Variables

For a discrete RV, \(P(X=x) > 0\) for all values in the support.

Does not hold for a continuous RV. Let’s see why:

Suppose \(P(X = x) = \epsilon\), \(x \in (0,1)\). let \(\epsilon\) be arbitrarily small.

How many real numbers are there between 0 and 1?

If each has probability \(\epsilon\) , \(P(X \in (0,1)) = \infty\)

CRV Continued

Let X be distributed continuous uniform from 0 to 10.

What is P(X = 3)?

What about P(X = 0.194345111223)?

Generically, for a continuous random variable, P(X = x) = 0.

This does not mean X = 3 cannot happen. It is in the support (defined as 0 to 10).

So…the PMF is pretty useless now. All values for X have zero mass!

The Continuous CDF

Definition: A random variable X is continuous if its CDF \(F(X \leq x)\) is a continuous function.

Relationship to Discrete

When we analyze discrete distributions, the PMF tells us the point probability of P(X = x). We can easily solve the probability that our random variable takes on any value.

We cannot do this for continuous random variables! But…we can use calculus to find the probability that X lies in an interval of the CDF.

What calculus operator would be useful here?

Probability Density Functions

The probability density function of a continuous random variable X is the function \(f_{x}(x)\) that satisfies

\[ F_{x}(x) = \int_{-\infty}^{x} f_{x}(x)dt \]

By the fundamental theorem of calculus, this is the derivative of the c.d.f. So, all we are doing is replacing \(\Sigma\) with \(\int\)

So, \(P(a < X <b ) = P(X \leq b) - P(x \leq a) = \int_{a}^{b}f_{x}(x)dx\)

Note - continuity means \(P(a < X < b) = P(a \leq x \leq b)\)

Graphing the PDF

Intuition - Hypothesis Testing

Properties of a PDF

  • The area under the curve of a region is equal to the probability of X falling in that region

    • The support of X is all values for which \(f_{x}(x)\) > 0
  • All valid PDFs are

    • Nonnegative: \(f_{x}(x) \geq 0\)

    • Integrates (rather than sums) to 1: \(\int_{-\infty}^{\infty} f_{x}(x)dx = 1\)

  • Unlike with a PMF, \(f_{x}(x)\) can be greater than 1!

Continuous Uniform Distribution

Let X be a random variable with a PDF \(f_{x}(x) = 1\) if X is in the interval (0,1) and f(X) = 0 otherwise.

Graphically:

Density Intuition

The CDF for Uniform (0,1)

Generic Continuous Uniform

Any continuous random variable X where the probability of X is the same over the entirety of the support is distributed Uniform. Can we work out the PDF?

if X is uniform on (a,b), the pdf is

\[ f(x) = \begin{cases} \frac{1}{b - a} & \text{for } x \in [a, b] \\ 0 & \text{otherwise} \end{cases} \]

Relatedly, if (c,d) is a subinterval of (a,b), then \(P(x) \in (c,d)\) is \(\frac{d-c}{b-a}\)

PDF for Uniform (0, 0.5)

Working with the Uniform in R

dunif computes the density f(x) of x where \(f(x)= \frac{1}{b-a}\), for a<x<b.

  • x: the value of x in f(x)

  • min: the lower bound of the interval (a). Default is 0.

  • max: the upper bound of the interval (b). Default is 1.

## What will each of these evaluate to?
dunif(0.5)
dunif(0.5, .25, .75)

Working with the Uniform in R

punif computes the cdf \(F(x)=P(X≤x)\) of X.

  • q: the value of x in F(x)

  • min: the lower bound of the interval (a). Default is 0.

  • max: the upper bound of the interval (b). Default is 1.

 ## What will each of these evaluate to? 
 dunif(0.5) 
 dunif(0.5, .25, .75)

Working with the Uniform in R

A very useful thing you can do in R is simulate data with a certain distribution.

runif draws random numbers from a uniform distribution

  • n: the sample size we want

  • min: the lower bound of the interval

  • max: the upper bound of the interval

## Run this on your machine:

random_numbers <-runif(20, -10, 10)
random_numbers

Setting a seed

In R (or any statistical program), we often want our results to be reproducible

But….if we run runif over and over again, we get different draws every time.

The solution is to set a seed, using set.seed()

## we set a seed for reproducibility (it will generate the same numbers each time) 
set.seed(29631)
random_numbers <-runif(20, -10, 10) 
random_numbers
 [1]  8.5734087  5.3892572 -7.5406310 -8.4881313  4.2082382 -9.6233854
 [7]  1.8259260  8.1478078  0.6631944  5.7339422  8.9260468 -6.3962945
[13]  9.9210618  6.6941542 -0.5974971  3.5098407 -8.7295550  5.4518538
[19] -5.4562228  3.5680572

Expectation of a Continuous RV

For any continuous random variable X, the expectation is

\[ E[X] = \int_{-\infty}^{\infty} xf_{x}(x)dx \]

What does this mean? How does it relate to the discrete version?

Expectation of a Uniform(a,b) RV

From the definition of expectation

\[ E[X] = \int_{a}^{b} xf_{x}(x)dx = \int_{a}^{b} x\frac{1}{b-a} dx \]

solving the integral and evaluating for the interval (a,b) gives us

\[ \frac{x^{2}}{2(b-a)} \Big|_{a}^{b} = \frac{b^{2} - a^{2}}{2(b-a)} = \frac{(b+a)(b-a)}{2(b-a)} = \frac{a+b}{2} \]

Variance of a Continuous RV

We already know the definition of Variance for any r.v. X.

\[ Var[X] = E[(X - E[X])^{2}] \]

Again, analogous to the discrete case

\[ Var[X] = \int_{-\infty}^{\infty} (x - E[X])^{2}f_{x}(x)dx \]

All the properties of expectation and variance (like linearity) hold in the continuous case. Importantly - Var[X] is still equal to \(E[X^{2}] - E[X]^{2}\)

LOTUS

We know that the variance of any RV X is \(E[X^{2}] - E[X]^{2}\). We can easily get \(E[X]^{2}\), but what about \(E[X^{2}]\)?

Calculating \(E[X^{2}]\) directly is possible but quite difficult. Fortunately, there is an easier way!

The Law of the Unconscious Statistician (LOTUS) says that \(E[g(x)]\) is equal to \(g(E[x])\) , which implies

\[ E[g(x)] = \int_{-\infty}^{\infty} g(x) f_{x}(x)dx \]

Variance of a Uniform RV

LOTUS means we can sub in \(X^{2}\) for X, and then take the expectation E(X) using the pdf of X

\[ E[X^{2}] = \int_{-a}^{b} x^{2}f_{x}(x)dx \]

We can evaluate the definite integral

\[ E[X^{2}] = \frac{1}{b-a}\frac{x^{3}}{3} \Big|_{a}^{b} = \frac{b^{3} - a^{3}}{3(b-a)} \]

Variance of a Uniform(a,b) RV

So, finally….

\[ Var[X] = E[X^{2}] - E[X]^{2} = \frac{b^{3} - a^{3}}{3(b-a)} - \bigg(\frac{a+b}{2}\bigg)^{2} \]

After an annoying amount of algebra (feel free to do this at home, use the difference in cubes formula!), this simplifies to:

\[ Var[X] = \frac{(b-a)^{2}}{12} \]

Standard Normal Distribution

A continuous random variable Z follows a standard normal distribution with E[Z] = 0 and \(Var[Z] = 1\)if it’s pdf \(\psi\) is:

\[ \psi(z) = \frac{1}{\sqrt{2\pi}}e^{-z^{2}/2}, \text{for } \infty < z < \infty \\ \text{written as } Z \sim N(0,1) \]

The CDF has no closed-form solution, but is written by convention as

\[ \Phi(z) = \int_{\infty}^{\infty}\frac{1}{\sqrt{2\pi}}e^{t^{2}/{2}}dt \]

Plotting the Normal

General Normal

If \(Z \sim N(0,1)\) then

\[ X = \mu + \sigma Z \]

is also distributed normal with mean \(\mu\) and variance \(\sigma^{2}\) .

\[ f_{x}(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x - \mu)^{2}}{2\sigma^{2}}} \]

Importantly, we can get back to the standard normal through standardization:

\(\frac{x - \mu}{\sigma} \sim N(0,1)\)

Normal Distributions in R

dnrom, pnorm and rnorm do the same things as their uniform counterparts

Let’s try and plot a couple of normal distributions.

Generate 20 random numbers from a normal(0,1) and plot the distribution (use a histogram with bin width 0.5).

Generate 200 random numbers from a normal(0,1) and plot the distribution.

Finally, generate 2000 random numbers and plot.

Inverse Functions

An inverse function essentially “reverses” the effect of a given function. If a function f maps an element x to f(x), then its inverse \(f^{-1} (x)\) will map f(x) back to x.

Definition: For a function f(x), the inverse function f-1(x) satisfies:

\(f(f^{-1}(x)) = x \quad \text{and} \quad f^{-1}(f(x)) = x\)

Intuition: What is the inverse of \(f(x) = x^{2}\) for \(x >0\)?

Quantile Functions

  • The inverse of the CDF, \(F^{-1}\) is called the quantile function

    • \(F^{-1}(\alpha)\) is the value of x such that \(P(X \leq x) = \alpha\)

    • The quantile function takes probabilities as arguments

    • \(F^{-1}(0.5)\) is the median, \(F^{-1}(0.9)\) is the upper decile

  • Soon: One way to obtain our confidence intervals is from the quantile function. \(F^{-1}(0.975)\) is the upper bound of a 95% confidence interval

Universality of the Uniform (A)

Let U ~ Unif(0,1) and let F be a continuous, increasing, CDF. Let \(X = F^{-1}(U)\). Then, X is an r.v. with CDF F.

Proof: for all real x:

\(P(X \leq x) = P(F^{-1}(U) \leq x) = P(U \leq F(x)) = F(x)\)

Example (Uniform to Normal)

Imagine that we had a random number generator that gives us numbers between 0 and 1. So, the output is uniform(0,1). Imagine we spin and get U = 0.975

Suppose we wanted instead random numbers that follow a standard normal distribution. We can get to a normal distribution by plugging 0.975into \(F^{-1}(U)\), which gives us a corresponding X for the standard normal distribution.

For the standard normal distribution, this give X a value such that \(P(Z \leq X) = 0.84\) . In this case, X = 1.96.

If we were to repeat this process many times, we would generate numbers following a standard normal distribution.

Universality of the Uniform (B)

Let X be an r.v. with CDF F. Then F(X) ~ Unif(0,1).

Proof:

Let X have cdf F and find the CDF of Y = F(X). Since Y takes values in (0,1), \(P(Y \leq y)\) is 0 for \(y \leq 0\) and 1 for \(y \geq 1\). For \(y \in (0,1)\)

\(P(Y \leq y) = P(F(X) \leq y) = P(X \leq F^{-1}(y)) = F(F^{-1}(y)) = y\)

Example (Normal to Uniform)

Now imagine instead that we have random numbers from a standard normal distribution. We know the CDF, \(F_{x}(X)\), of the normal distribution, which gives the probability that X is less than or equal to some value.

To transform X into a uniform random variable, we compute \(U = F_{x}(X)\)

Suppose X = 1. Let \(U = F_{x}(X)\). To do so, we just find \(P(Z < 1)\), which happens to be 0.84.

Empirical Rule for Normal Distribution