16 Day 15 (July 1)

16.1 Announcements

  • All project proposals have been approved/graded!!

  • In-class work day on Thursday

    • Goal-I’d like to see your data!
  • Recommended reading

    • Chapter 3 in Linear Models with R
  • Selected questions from journals

    • “What exactly are we referencing when we say “we gain statistical inference”? I see that in MLE we get to explore CI, p, and standard error. Are these metrics or examples of statistical inference? Or items that we get to apply inference to?”
    • What if the assumptions of my model are wrong?
    • “Is the t-distribution essentially the normal distribution with a larger area under the tails? Otherwise, I cannot grasp the true difference between them, other than you explaining how the t-distribution is better for outliers, and the t-distribution contains degrees of freedom.”
    • Big data and statistical significance!
    • “Also, I finally understand the idea of CI’s crossing 0 being non-significant. Especially when we discussed my model for the unicorn problem after class. A CI that crosses 0 would be desirable in my case, as we want to see that there is no difference between the two treatments.”
    • “I’m confused about what the MLE actually is. In the quail example, is it the -1.15? I get that you still get parameter estimates, so maybe I’m confused about sigma? Now that I’m typing this out I’m not even sure I know what I’m confused about. MLE is just an estimation option that gets you inference as well as estimates, right?”

16.2 Confidence intervals for derived quantities

  • Demonstration of how not to do it.

    y <- c(63, 68, 61, 44, 103, 90, 107, 105, 76, 46, 60, 66, 58, 39, 64, 29, 37,
    27, 38, 14, 38, 52, 84, 112, 112, 97, 131, 168, 70, 91, 52, 33, 33, 27,
    18, 14, 5, 22, 31, 23, 14, 18, 23, 27, 44, 18, 19)
    year <- 1965:2011
    df <- data.frame(y = y, year = year)
    plot(x = df$year, y = df$y, xlab = "Year", ylab = "Annual count", main = "",
    col = "brown", pch = 20, xlim = c(1965, 2040))

    m1 <- lm(y~year,data=df)
    coef(m1)
    ## (Intercept)        year 
    ##  2356.48797    -1.15784
    confint(m1,level=0.95) 
    ##                 2.5 %       97.5 %
    ## (Intercept) 929.80699 3783.1689540
    ## year         -1.87547   -0.4402103
    # This is ok!
    beta0.hat <- as.numeric(coef(m1)[1])
    beta1.hat <- as.numeric(coef(m1)[2])
    extinct.date.hat <- -beta0.hat/beta1.hat
    extinct.date.hat
    ## [1] 2035.245
    # This is not ok!
    
    beta0.hat.li <- confint(m1)[1,1]
    beta1.hat.li <- confint(m1)[2,1]
    extinct.date.li <- -beta0.hat.li/beta1.hat.li
    extinct.date.li
    ## [1] 495.7729
    beta0.hat.ui <- confint(m1)[1,2]
    beta1.hat.ui <- confint(m1)[2,2]
    extinct.date.ui <- -beta0.hat.ui/beta1.hat.ui
    extinct.date.ui
    ## [1] 8594.004
  • The delta method

    library(msm)
    
    y <- c(63, 68, 61, 44, 103, 90, 107, 105, 76, 46, 60, 66, 58, 39, 64, 29, 37, 27,
        38, 14, 38, 52, 84, 112, 112, 97, 131, 168, 70, 91, 52, 33, 33, 27, 18, 14, 5,
        22, 31, 23, 14, 18, 23, 27, 44, 18, 19)
    year <- 1965:2011
    df <- data.frame(y = y, year = year)
    plot(x = df$year, y = df$y, xlab = "Year", ylab = "Annual count", main = "", col = "brown",
        pch = 20, xlim = c(1965, 2040))

    m1 <- lm(y ~ year, data = df)
    coef(m1)
    ## (Intercept)        year 
    ##  2356.48797    -1.15784
    confint(m1, level = 0.95)
    ##                 2.5 %       97.5 %
    ## (Intercept) 929.80699 3783.1689540
    ## year         -1.87547   -0.4402103
    beta0.hat <- as.numeric(coef(m1)[1])
    beta1.hat <- as.numeric(coef(m1)[2])
    extinct.date.hat <- -beta0.hat/beta1.hat
    extinct.date.hat
    ## [1] 2035.245
    extinct.date.se <- deltamethod(~-x1/x2, mean = coef(m1), cov = vcov(m1))
    
    extinct.date.ci <- c(extinct.date.hat - 1.96 * extinct.date.se, extinct.date.hat +
        1.96 * extinct.date.se)
    extinct.date.ci
    ## [1] 2005.598 2064.892

16.3 Bootstrap confidence intervals

  • Google scholar demonstration

  • Motivation (see section 3.6 in Faraway (2014))

    • Functions of parameters (i.e., “derived” quantities)
    • Difficult to obtain the sampling distribution for non-linear functions of estimated parameters
    • Further reading (Hesterberg 2015)
    • Example: When do we expect the population to go extinct?
    y <- c(63, 68, 61, 44, 103, 90, 107, 105, 76, 46, 60, 66, 58, 39, 64, 29, 37, 27,
        38, 14, 38, 52, 84, 112, 112, 97, 131, 168, 70, 91, 52, 33, 33, 27, 18, 14, 5,
        22, 31, 23, 14, 18, 23, 27, 44, 18, 19)
    year <- 1965:2011
    df <- data.frame(y = y, year = year)
    plot(x = df$year, y = df$y, xlab = "Year", ylab = "Annual count", main = "", col = "brown",
        pch = 20, xlim = c(1965, 2040))
    m1 <- lm(y ~ year)
    abline(m1)

  • Non-parametric bootstrap (see Efron and Tibshirani (1994)).

    1. From a data set with n observations, take a sample of size n with replacement. For a linear model, the sample should include both the observed response (\(y_{i}\)) and covariates (\(x_{1}\),\(x_{2}\),…,\(x_{p}\)).
    2. Estimate the parameters for a statistical model using the sampled data from step 1.
    3. Save the estimates of interest. This could be parameters of interest (e.g., \(\boldsymbol{\beta}\)) or a a derived quantity (e.g., \(R^{2}\), \(\frac{1}{\boldsymbol{\beta}}\)).
    4. Repeats steps (1)-(3) \(m\) times.
  • Inference

    • The \(m\) samples of the quantities of interest are samples from the empirical distribution.
    • The empirical distribution can summarized using sample statistics (e.g., quantiles, mean, variance, etc). Conceptually, this is similar to Monte Carlo integration.
  • Example in R

    library(latex2exp)
    # Number of bootstrap samples (m)
    m.boot <- 1000
    
    # Create matrix to save empirical distribution of -beta2.hat/beta1.hat
    # (expected time of extinction)
    ed.extinct.hat <- matrix(, m.boot, 1)
    
    # Set random seed so results are the same if we run it again Results would be
    # different due to random resampling of data
    set.seed(1940)
    
    # Start for loop for non-parametric boostrap
    for (m in 1:m.boot) {
    
        # Sample data with replacement boot.sample gives the rows of df that we use
        # for estimation
        boot.sample <- sample(1:nrow(df), replace = TRUE)
    
        # Make temporary data frame that contains the resamples
        df.temp <- df[boot.sample, ]
    
        # Estimate parameters for df.temp
        m1 <- lm(y ~ year, data = df.temp)
    
        # Save estimate of -beta0.hat/beta1.hat (expected time of extinction)
        ed.extinct.hat[m, ] <- -coef(m1)[1]/coef(m1)[2]
    }
    par(mar = c(5, 4, 7, 2))
    hist(ed.extinct.hat, col = "grey", xlab = "Year", main = TeX("Empirical distribuiton of $$-$$\\hat{\\frac{$\\beta_0}{$\\beta_1}}"),
        freq = FALSE, breaks = 20)

    # 95% equal-tailed CI based on percentiles of the empirical distribution
    quantile(ed.extinct.hat, prob = c(0.025, 0.975))
    ##     2.5%    97.5% 
    ## 2021.203 2077.168
  • Example in R using the mosaic package

    library(mosaic)
    
    set.seed(1940)
    bootstrap <- do(1000) * coef(lm(y ~ year, data = resample(df)))
    head(bootstrap)
    ##   Intercept       year
    ## 1  1703.401 -0.8291862
    ## 2  3017.102 -1.4887107
    ## 3  2683.814 -1.3195092
    ## 4  2595.994 -1.2778627
    ## 5  2334.905 -1.1463994
    ## 6  2011.819 -0.9811046
    # 95% equal-tailed CI based on percentiles of the empirical distribution
    quantile(-bootstrap[, 1]/bootstrap[, 2], prob = c(0.025, 0.975))
    ##     2.5%    97.5% 
    ## 2021.203 2077.168
    confint(bootstrap, method = "percentile")  # Comparison of 95% CI with traditional approach
    ##   name     lower      upper level     method estimate
    ## 1 year -1.623489 -0.6753815  0.95 percentile -1.15784
    confint(m1)[2, ]
    ##      2.5 %     97.5 % 
    ## -1.9107236 -0.5405716

Literature cited

Efron, Bradley, and Robert J Tibshirani. 1994. An Introduction to the Bootstrap. CRC press.
Faraway, J. J. 2014. Linear Models with r. CRC Press.
Hesterberg, Tim C. 2015. “What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum.” The American Statistician 69 (4): 371–86.
Powell, Larkin A. 2007. “Approximating Variance of Demographic Parameters Using the Delta Method: A Reference for Avian Biologists.” The Condor 109 (4): 949–54.
Ver Hoef, Jay M. 2012. “Who Invented the Delta Method?” The American Statistician 66 (2): 124–27.