19 Day 18 (July 7)

19.1 Announcements

Assignment 4 is posted
Recommended reading
- Chapter 4 in Linear Models with R
Selected questions from journals
- It is ok to do statistical modeling on statistics
- “In my lab we don’t use confidence intervals often and I am not sure why; do you have an idea why we wouldn’t use these to present our data?”
- “Also, I realized today that an MLE point estimate is not necessarily the middle value of a confidence interval. For some reason I’ve been assuming that this whole time but that’s not the case.”
- “Today I learned that Bayesian models don’t contain confidence intervals, but they contain distributional assumptions for each unknown within the model. I think this is a great method to extrapolate…..One question I have is how to know how accurate a model is without data. Adding a ton of assumptions to a model could improve it, but it can also have the opposite effect, right? Would you use several different methods to extrapolate and confirm? In what way would you apply this?”
- “Something I’m struggling to understand is the difference between a confidence interval and a prediction interval. I know they both give a range of values, but I’m not sure when to use each one, and why prediction intervals are wider than confidence intervals.”

19.2 Bootstrap confidence intervals

Google scholar demonstration

Motivation (see section 3.6 in Faraway (2014))

Functions of parameters (i.e., “derived” quantities)
Difficult to obtain the sampling distribution for non-linear functions of estimated parameters
Further reading (Hesterberg 2015)
Example: When do we expect the population to go extinct?

y <- c(63, 68, 61, 44, 103, 90, 107, 105, 76, 46, 60, 66, 58, 39, 64, 29, 37, 27,
    38, 14, 38, 52, 84, 112, 112, 97, 131, 168, 70, 91, 52, 33, 33, 27, 18, 14, 5,
    22, 31, 23, 14, 18, 23, 27, 44, 18, 19)
year <- 1965:2011
df <- data.frame(y = y, year = year)
plot(x = df$year, y = df$y, xlab = "Year", ylab = "Annual count", main = "", col = "brown",
    pch = 20, xlim = c(1965, 2040))
m1 <- lm(y ~ year)
abline(m1)

Non-parametric bootstrap (see Efron and Tibshirani (1994)).
1. From a data set with n observations, take a sample of size n with replacement. For a linear model, the sample should include both the observed response ($y_{i}$) and covariates ($x_{1}$,$x_{2}$,…,$x_{p}$).
2. Estimate the parameters for a statistical model using the sampled data from step 1.
3. Save the estimates of interest. This could be parameters of interest (e.g., $\boldsymbol{\beta}$) or a a derived quantity (e.g., $R^{2}$, $\frac{1}{\boldsymbol{\beta}}$).
4. Repeats steps (1)-(3) $m$ times.
Inference
- The $m$ samples of the quantities of interest are samples from the empirical distribution.
- The empirical distribution can summarized using sample statistics (e.g., quantiles, mean, variance, etc). Conceptually, this is similar to Monte Carlo integration.

Example in R

library(latex2exp)
# Number of bootstrap samples (m)
m.boot <- 1000

# Create matrix to save empirical distribution of -beta2.hat/beta1.hat
# (expected time of extinction)
ed.extinct.hat <- matrix(, m.boot, 1)

# Set random seed so results are the same if we run it again Results would be
# different due to random resampling of data
set.seed(1940)

# Start for loop for non-parametric boostrap
for (m in 1:m.boot) {

    # Sample data with replacement boot.sample gives the rows of df that we use
    # for estimation
    boot.sample <- sample(1:nrow(df), replace = TRUE)

    # Make temporary data frame that contains the resamples
    df.temp <- df[boot.sample, ]

    # Estimate parameters for df.temp
    m1 <- lm(y ~ year, data = df.temp)

    # Save estimate of -beta0.hat/beta1.hat (expected time of extinction)
    ed.extinct.hat[m, ] <- -coef(m1)[1]/coef(m1)[2]
}
par(mar = c(5, 4, 7, 2))
hist(ed.extinct.hat, col = "grey", xlab = "Year", main = TeX("Empirical distribuiton of $$-$$\\hat{\\frac{$\\beta_0}{$\\beta_1}}"),
    freq = FALSE, breaks = 20)

# 95% equal-tailed CI based on percentiles of the empirical distribution
quantile(ed.extinct.hat, prob = c(0.025, 0.975))

##     2.5%    97.5% 
## 2021.203 2077.168

Example in R using the mosaic package

library(mosaic)

set.seed(1940)
bootstrap <- do(1000) * coef(lm(y ~ year, data = resample(df)))
head(bootstrap)

##   Intercept       year
## 1  1703.401 -0.8291862
## 2  3017.102 -1.4887107
## 3  2683.814 -1.3195092
## 4  2595.994 -1.2778627
## 5  2334.905 -1.1463994
## 6  2011.819 -0.9811046

# 95% equal-tailed CI based on percentiles of the empirical distribution
quantile(-bootstrap[, 1]/bootstrap[, 2], prob = c(0.025, 0.975))

##     2.5%    97.5% 
## 2021.203 2077.168

confint(bootstrap, method = "percentile")  # Comparison of 95% CI with traditional approach

##   name     lower      upper level     method estimate
## 1 year -1.623489 -0.6753815  0.95 percentile -1.15784

confint(m1)[2, ]

##      2.5 %     97.5 % 
## -1.9107236 -0.5405716

Prediction

My definition of inference and prediction
- Inference = Learning about what you can’t observe given what you did observe (and some assumptions)
- Prediction = Learning about what you didn’t observe given what you did observe (and some assumptions)

Prediction (Ch. 4 in Faraway (2014))

Derived quantity
- $\mathbf{x}^{\prime}_0$ is a $1\times p$ matrix of covariates (could be a row $\mathbf{X}$ or completely new values of the predictors)
- Use $\widehat{\text{E}(y_0)}=\mathbf{x}^{\prime}_0\hat{\boldsymbol{\beta}}$
Example
- Predicting the number of whooping cranes

url <- "https://www.dropbox.com/s/8grip274233dr9a/Butler%20et%20al.%20Table%201.csv?dl=1"
df1 <- read.csv(url)

plot(df1$Winter, df1$N, xlab = "Year", ylab = "Population size", xlim = c(1940, 2025),
    ylim = c(0, 300), typ = "b", lwd = 1.5, pch = "*")

What model should we use?

m1 <- lm(N ~ Winter + I(Winter^2), data = df1)
Ey.hat <- predict(m1)

plot(df1$Winter, df1$N, xlab = "Year", ylab = "Population size", xlim = c(1940, 2021),
    ylim = c(0, 300), typ = "b", lwd = 1.5, pch = "*")
points(df1$Winter, Ey.hat, typ = "l", col = "red")

19.3 Intervals for predictions

Expected value and variance of $\mathbf{x}^{\prime}_0\hat{\boldsymbol{\beta}}$

Confidence interval $\text{P}\left(\mathbf{x}^{\prime}_0\hat{\boldsymbol{\beta}}-t_{\alpha/2,n-p}\sqrt{\hat{\sigma^{2}}\mathbf{x}_{0}^{'}(\mathbf{X}^{'}\mathbf{X})^{-1}\mathbf{x}_{0}}<\mathbf{x}^{\prime}_0\boldsymbol{\beta}<\mathbf{x}^{\prime}_0\hat{\boldsymbol{\beta}}+t_{\alpha/2,n-p}\sqrt{\hat{\sigma^{2}}\mathbf{x}_{0}^{'}(\mathbf{X}^{'}\mathbf{X})^{-1}\mathbf{x}_{0}}\right)\\=1-\alpha$

The 95% CI is $\mathbf{x}^{\prime}_0\hat{\boldsymbol{\beta}}\pm t_{\alpha/2,n-p}\sqrt{\hat{\sigma^{2}}\mathbf{x}_{0}^{'}(\mathbf{X}^{'}\mathbf{X})^{-1}\mathbf{x}_{0}}$
In R

Ey.hat <- predict(m1, interval = "confidence")
head(Ey.hat)

##        fit      lwr      upr
## 1 29.35980 21.43774 37.28186
## 2 28.84122 21.42928 36.25315
## 3 28.47610 21.54645 35.40575
## 4 28.26444 21.78812 34.74075
## 5 28.20624 22.15308 34.25940
## 6 28.30150 22.64000 33.96300

plot(df1$Winter, df1$N, xlab = "Year", ylab = "Population size", xlim = c(1940, 2025),
    ylim = c(0, 300), typ = "b", lwd = 1.5, pch = "*")
points(df1$Winter, Ey.hat[, 1], typ = "l", col = "red")
points(df1$Winter, Ey.hat[, 2], typ = "l", col = "red", lty = 2)
points(df1$Winter, Ey.hat[, 3], typ = "l", col = "red", lty = 2)

Why are there so many data points that fall outside of the 95% CIs?

Prediction intervals vs. Confidence intervals
- CIs for $\widehat{\text{E}(y_0)}=\mathbf{x}^{\prime}_0\hat{\boldsymbol{\beta}}$
- How to interpret $\widehat{\text{E}(y_0)}$
- What if I wanted to predict $y_0$?
  - $y_0\sim\text{N}(\mathbf{x}^{\prime}_0\hat{\boldsymbol{\beta}},\hat{\sigma}^2)$
- Expected value and variance of $y_0$
- Prediction interval $\text{P}\left(\mathbf{x}^{\prime}_0\hat{\boldsymbol{\beta}}-t_{\alpha/2,n-p}\sqrt{\hat{\sigma^{2}}(1+\mathbf{x}_{0}^{'}(\mathbf{X}^{'}\mathbf{X})^{-1}\mathbf{x}_{0})}<y_0<\mathbf{x}^{\prime}_0\hat{\boldsymbol{\beta}}+t_{\alpha/2,n-p}\sqrt{\hat{\sigma^{2}}(\mathbf{x}_{0}^{'}(\mathbf{X}^{'}\mathbf{X})^{-1}\mathbf{x}_{0})}\right)=\\1-\alpha$
- The 95% PI is $\mathbf{x}^{\prime}_0\hat{\boldsymbol{\beta}}\pm t_{\alpha/2,n-p}\sqrt{\hat{\sigma^{2}}(1+\mathbf{x}_{0}^{'}(\mathbf{X}^{'}\mathbf{X})^{-1}\mathbf{x}_{0})}$
- Example in R
```
y.hat <- predict(m1, interval = "prediction")
head(y.hat)
```
```
##        fit      lwr      upr
## 1 29.35980 6.630220 52.08938
## 2 28.84122 6.284368 51.39807
## 3 28.47610 6.073089 50.87910
## 4 28.26444 5.997480 50.53139
## 5 28.20624 6.058655 50.35382
## 6 28.30150 6.257741 50.34526
```
```
plot(df1$Winter, df1$N, xlab = "Year", ylab = "Population size", xlim = c(1940, 2025),
    ylim = c(0, 300), typ = "b", lwd = 1.5, pch = "*")
points(df1$Winter, y.hat[, 1], typ = "l", col = "red")
points(df1$Winter, y.hat[, 2], typ = "l", col = "red", lty = 2)
points(df1$Winter, y.hat[, 3], typ = "l", col = "red", lty = 2)
```

Literature cited

Efron, Bradley, and Robert J Tibshirani. 1994. An Introduction to the Bootstrap. CRC press.

Faraway, J. J. 2014. Linear Models with r. CRC Press.

Hesterberg, Tim C. 2015. “What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum.” The American Statistician 69 (4): 371–86.