16 Day 15 (July 1)

16.1 Announcements

All project proposals have been approved/graded!!
In-class work day on Thursday
- Goal-I’d like to see your data!
Recommended reading
- Chapter 3 in Linear Models with R
Selected questions from journals
- “What exactly are we referencing when we say “we gain statistical inference”? I see that in MLE we get to explore CI, p, and standard error. Are these metrics or examples of statistical inference? Or items that we get to apply inference to?”
- What if the assumptions of my model are wrong?
- “Is the t-distribution essentially the normal distribution with a larger area under the tails? Otherwise, I cannot grasp the true difference between them, other than you explaining how the t-distribution is better for outliers, and the t-distribution contains degrees of freedom.”
- Big data and statistical significance!
- “Also, I finally understand the idea of CI’s crossing 0 being non-significant. Especially when we discussed my model for the unicorn problem after class. A CI that crosses 0 would be desirable in my case, as we want to see that there is no difference between the two treatments.”
- “I’m confused about what the MLE actually is. In the quail example, is it the -1.15? I get that you still get parameter estimates, so maybe I’m confused about sigma? Now that I’m typing this out I’m not even sure I know what I’m confused about. MLE is just an estimation option that gets you inference as well as estimates, right?”

16.2 Confidence intervals for derived quantities

Demonstration of how not to do it.

y <- c(63, 68, 61, 44, 103, 90, 107, 105, 76, 46, 60, 66, 58, 39, 64, 29, 37,
27, 38, 14, 38, 52, 84, 112, 112, 97, 131, 168, 70, 91, 52, 33, 33, 27,
18, 14, 5, 22, 31, 23, 14, 18, 23, 27, 44, 18, 19)
year <- 1965:2011
df <- data.frame(y = y, year = year)
plot(x = df$year, y = df$y, xlab = "Year", ylab = "Annual count", main = "",
col = "brown", pch = 20, xlim = c(1965, 2040))

m1 <- lm(y~year,data=df)
coef(m1)

## (Intercept)        year 
##  2356.48797    -1.15784

confint(m1,level=0.95)

##                 2.5 %       97.5 %
## (Intercept) 929.80699 3783.1689540
## year         -1.87547   -0.4402103

# This is ok!
beta0.hat <- as.numeric(coef(m1)[1])
beta1.hat <- as.numeric(coef(m1)[2])
extinct.date.hat <- -beta0.hat/beta1.hat
extinct.date.hat

## [1] 2035.245

# This is not ok!

beta0.hat.li <- confint(m1)[1,1]
beta1.hat.li <- confint(m1)[2,1]
extinct.date.li <- -beta0.hat.li/beta1.hat.li
extinct.date.li

## [1] 495.7729

beta0.hat.ui <- confint(m1)[1,2]
beta1.hat.ui <- confint(m1)[2,2]
extinct.date.ui <- -beta0.hat.ui/beta1.hat.ui
extinct.date.ui

## [1] 8594.004

The delta method

See (Ver Hoef 2012) and (Powell 2007) for more info

library(msm)

y <- c(63, 68, 61, 44, 103, 90, 107, 105, 76, 46, 60, 66, 58, 39, 64, 29, 37, 27,
    38, 14, 38, 52, 84, 112, 112, 97, 131, 168, 70, 91, 52, 33, 33, 27, 18, 14, 5,
    22, 31, 23, 14, 18, 23, 27, 44, 18, 19)
year <- 1965:2011
df <- data.frame(y = y, year = year)
plot(x = df$year, y = df$y, xlab = "Year", ylab = "Annual count", main = "", col = "brown",
    pch = 20, xlim = c(1965, 2040))

m1 <- lm(y ~ year, data = df)
coef(m1)

## (Intercept)        year 
##  2356.48797    -1.15784

confint(m1, level = 0.95)

##                 2.5 %       97.5 %
## (Intercept) 929.80699 3783.1689540
## year         -1.87547   -0.4402103

beta0.hat <- as.numeric(coef(m1)[1])
beta1.hat <- as.numeric(coef(m1)[2])
extinct.date.hat <- -beta0.hat/beta1.hat
extinct.date.hat

## [1] 2035.245

extinct.date.se <- deltamethod(~-x1/x2, mean = coef(m1), cov = vcov(m1))

extinct.date.ci <- c(extinct.date.hat - 1.96 * extinct.date.se, extinct.date.hat +
    1.96 * extinct.date.se)
extinct.date.ci

## [1] 2005.598 2064.892

16.3 Bootstrap confidence intervals

Google scholar demonstration

Motivation (see section 3.6 in Faraway (2014))

Functions of parameters (i.e., “derived” quantities)
Difficult to obtain the sampling distribution for non-linear functions of estimated parameters
Further reading (Hesterberg 2015)
Example: When do we expect the population to go extinct?

y <- c(63, 68, 61, 44, 103, 90, 107, 105, 76, 46, 60, 66, 58, 39, 64, 29, 37, 27,
    38, 14, 38, 52, 84, 112, 112, 97, 131, 168, 70, 91, 52, 33, 33, 27, 18, 14, 5,
    22, 31, 23, 14, 18, 23, 27, 44, 18, 19)
year <- 1965:2011
df <- data.frame(y = y, year = year)
plot(x = df$year, y = df$y, xlab = "Year", ylab = "Annual count", main = "", col = "brown",
    pch = 20, xlim = c(1965, 2040))
m1 <- lm(y ~ year)
abline(m1)

Non-parametric bootstrap (see Efron and Tibshirani (1994)).
1. From a data set with n observations, take a sample of size n with replacement. For a linear model, the sample should include both the observed response (\(y_{i}\)) and covariates (\(x_{1}\),\(x_{2}\),…,\(x_{p}\)).
2. Estimate the parameters for a statistical model using the sampled data from step 1.
3. Save the estimates of interest. This could be parameters of interest (e.g., \(\boldsymbol{\beta}\)) or a a derived quantity (e.g., \(R^{2}\), \(\frac{1}{\boldsymbol{\beta}}\)).
4. Repeats steps (1)-(3) \(m\) times.
Inference
- The \(m\) samples of the quantities of interest are samples from the empirical distribution.
- The empirical distribution can summarized using sample statistics (e.g., quantiles, mean, variance, etc). Conceptually, this is similar to Monte Carlo integration.

Example in R

library(latex2exp)
# Number of bootstrap samples (m)
m.boot <- 1000

# Create matrix to save empirical distribution of -beta2.hat/beta1.hat
# (expected time of extinction)
ed.extinct.hat <- matrix(, m.boot, 1)

# Set random seed so results are the same if we run it again Results would be
# different due to random resampling of data
set.seed(1940)

# Start for loop for non-parametric boostrap
for (m in 1:m.boot) {

    # Sample data with replacement boot.sample gives the rows of df that we use
    # for estimation
    boot.sample <- sample(1:nrow(df), replace = TRUE)

    # Make temporary data frame that contains the resamples
    df.temp <- df[boot.sample, ]

    # Estimate parameters for df.temp
    m1 <- lm(y ~ year, data = df.temp)

    # Save estimate of -beta0.hat/beta1.hat (expected time of extinction)
    ed.extinct.hat[m, ] <- -coef(m1)[1]/coef(m1)[2]
}
par(mar = c(5, 4, 7, 2))
hist(ed.extinct.hat, col = "grey", xlab = "Year", main = TeX("Empirical distribuiton of $$-$$\\hat{\\frac{$\\beta_0}{$\\beta_1}}"),
    freq = FALSE, breaks = 20)

# 95% equal-tailed CI based on percentiles of the empirical distribution
quantile(ed.extinct.hat, prob = c(0.025, 0.975))

##     2.5%    97.5% 
## 2021.203 2077.168

Example in R using the mosaic package

library(mosaic)

set.seed(1940)
bootstrap <- do(1000) * coef(lm(y ~ year, data = resample(df)))
head(bootstrap)

##   Intercept       year
## 1  1703.401 -0.8291862
## 2  3017.102 -1.4887107
## 3  2683.814 -1.3195092
## 4  2595.994 -1.2778627
## 5  2334.905 -1.1463994
## 6  2011.819 -0.9811046

# 95% equal-tailed CI based on percentiles of the empirical distribution
quantile(-bootstrap[, 1]/bootstrap[, 2], prob = c(0.025, 0.975))

##     2.5%    97.5% 
## 2021.203 2077.168

confint(bootstrap, method = "percentile")  # Comparison of 95% CI with traditional approach

##   name     lower      upper level     method estimate
## 1 year -1.623489 -0.6753815  0.95 percentile -1.15784

confint(m1)[2, ]

##      2.5 %     97.5 % 
## -1.9107236 -0.5405716

Literature cited

Efron, Bradley, and Robert J Tibshirani. 1994. An Introduction to the Bootstrap. CRC press.

Faraway, J. J. 2014. Linear Models with r. CRC Press.

Hesterberg, Tim C. 2015. “What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum.” The American Statistician 69 (4): 371–86.

Powell, Larkin A. 2007. “Approximating Variance of Demographic Parameters Using the Delta Method: A Reference for Avian Biologists.” The Condor 109 (4): 949–54.

Ver Hoef, Jay M. 2012. “Who Invented the Delta Method?” The American Statistician 66 (2): 124–27.