36 Day 35 (July 30)

36.1 Announcements

Presentations are going well!
Selected questions from journals
- General question about col linearity….every method is sensitive to collinearity
- “I understand that adding more assumptions should give us something back to make it worth it. And that’s not always the case with random effects. But how do we explain this to professors who are used to using them? The main thing we often hear is that if we consider something like location and block(location) as a random effect then we canmake broader inferences, for example, to all of Kansas. That’s the main reason I’veheard for why we should consider random effects.”

36.2 Random effects

Conditional, marginal, and joint distributions
- Conditional distribution: “\(\mathbf{y}\) given \(\boldsymbol{\beta}\)” \[\mathbf{y}|\boldsymbol{\beta}\sim\text{N}(\mathbf{X}\boldsymbol{\beta},\sigma_{\varepsilon}^{2}\mathbf{I})\] This is also sometimes called the data model.
- Marginal distribution of \(\boldsymbol{\beta}\) \[\beta\sim\text{N}(\mathbf{0},\sigma_{\beta}^{2}\mathbf{I})\] This is also called the parameter model.
- Joint distribution \([\mathbf{y}|\boldsymbol{\beta}][\boldsymbol{\beta}]\)
Estimation for the linear model with random effects
- By maximizing the joint likelihood
  - Write out the likelihood function \[\mathcal{L}(\boldsymbol{\beta},\sigma_{\varepsilon},\sigma_{\beta}^{2})=\frac{1}{(2\pi)^{-\frac{n}{2}}}|\sigma^{2}_{\varepsilon}\mathbf{I}|^{-\frac{1}{2}}\text{exp}\bigg{(}-\frac{1}{2}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^{\prime}(\sigma^{2}_{\varepsilon}\mathbf{I})^{-1}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\bigg{)}\times\\\frac{1}{(2\pi)^{-\frac{p}{2}}}|\sigma^{2}_{\beta}\mathbf{I}|^{-\frac{1}{2}}\text{exp}\bigg{(}-\frac{1}{2}\boldsymbol{\beta}^{\prime}(\sigma^{2}_{\beta}\mathbf{I})^{-1}\boldsymbol{\beta}\bigg{)}\]
  - Maximizing the log-likelihood function w.r.t. gives \(\boldsymbol{\beta}\) gives \[\hat{\boldsymbol{\beta}}=\big{(}\mathbf{X}^{'}\mathbf{X}+\frac{\sigma^{2}_{\varepsilon}}{\sigma^{2}_{\beta}}\mathbf{I}\big{)}^{-1}\mathbf{X}^{'}\mathbf{y}\]

Example eye temperature data set for Eurasian blue tit

df.et <- read.table("https://www.dropbox.com/s/bo11n66cezdnbia/eyetemp.txt?dl=1")
names(df.et) <- c("id", "sex", "date", "time", "event", "airtemp", "relhum", "sun",
    "bodycond", "eyetemp")
df.et$id <- as.factor(df.et$id)
df.et$id.num <- as.numeric(df.et$id)

ind.means <- aggregate(eyetemp ~ id.num, df.et, FUN = base::mean)
plot(df.et$id.num, df.et$eyetemp, xlab = "Individual", ylab = "Eye temperature")
points(ind.means, pch = "x", col = "gold", cex = 1.5)

# Standard 'fixed effects' linear regression model
m1 <- lm(eyetemp ~ as.factor(id.num) - 1, data = df.et)
summary(m1)

## 
## Call:
## lm(formula = eyetemp ~ as.factor(id.num) - 1, data = df.et)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7378 -0.5750  0.1507  0.6622  3.0286 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## as.factor(id.num)1   29.3423     0.1972  148.77   <2e-16 ***
## as.factor(id.num)2   28.8735     0.1725  167.41   <2e-16 ***
## as.factor(id.num)3   27.5458     0.2053  134.19   <2e-16 ***
## as.factor(id.num)4   28.0156     0.1778  157.59   <2e-16 ***
## as.factor(id.num)5   30.8188     0.1211  254.56   <2e-16 ***
## as.factor(id.num)6   32.6486     0.1700  192.06   <2e-16 ***
## as.factor(id.num)7   29.6800     0.3180   93.33   <2e-16 ***
## as.factor(id.num)8   29.0000     0.2789  103.97   <2e-16 ***
## as.factor(id.num)9   29.4378     0.1499  196.36   <2e-16 ***
## as.factor(id.num)10  29.7667     0.1836  162.12   <2e-16 ***
## as.factor(id.num)11  28.4714     0.2195  129.74   <2e-16 ***
## as.factor(id.num)12  28.8793     0.1867  154.64   <2e-16 ***
## as.factor(id.num)13  30.1500     0.7111   42.40   <2e-16 ***
## as.factor(id.num)14  30.6500     0.7111   43.10   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.006 on 358 degrees of freedom
## Multiple R-squared:  0.9989, Adjusted R-squared:  0.9989 
## F-statistic: 2.31e+04 on 14 and 358 DF,  p-value: < 2.2e-16

plot(df.et$id.num, df.et$eyetemp, xlab = "Individual", ylab = "Expected eye temperature",
    col = "white")
points(1:14, coef(m1), pch = 20, col = "black", cex = 1)
segments(1:14, confint(m1)[, 1], 1:14, confint(m1)[, 2], lwd = 2, lty = 1)

# Random effects regression model (using nlme package; difficult to get
# confidence interval)
library(nlme)
m2 <- lme(fixed = eyetemp ~ 1, random = list(id = pdIdent(~id - 1)), data = df.et,
    method = "ML")  #summary(m2)
# predict(m2,level=1,newdata=data.frame(id=unique(df.et$id)))

# Random effects regression model (mgcv package)
library(mgcv)
m2 <- gam(eyetemp ~ 1 + s(id, bs = "re"), data = df.et, method = "ML")

plot(df.et$id.num, df.et$eyetemp, xlab = "Individual", ylab = "Expected eye temperature",
    col = "white")
points(1:14 - 0.15, coef(m1), pch = 20, col = "black", cex = 1)
segments(1:14 - 0.15, confint(m1)[, 1], 1:14 - 0.15, confint(m1)[, 2], lwd = 2, lty = 1)
pred <- predict(m2, newdata = data.frame(id = unique(df.et$id)), se.fit = TRUE)
pred$lci <- pred$fit - 1.96 * pred$se.fit
pred$uci <- pred$fit + 1.96 * pred$se.fit
points(1:14 + 0.15, pred$fit, pch = 20, col = "green", cex = 1)
segments(1:14 + 0.15, pred$lci, 1:14 + 0.15, pred$uci, lwd = 2, lty = 1, col = "green")
abline(a = coef(m2)[1], b = 0, col = "grey")

Summary
- Take home message
  - Treating \(\boldsymbol{\beta}\) as a random effect “shrinks” the estimates of \(\boldsymbol{\beta}\) closer to zero.
- Requires an extra assumption \(\boldsymbol{\beta}\sim\text{N}(\mathbf{0},\sigma_{\beta}^{2}\mathbf{I})\)
  - Penalty
  - Random effect
  - This called a prior distribution in Bayesian statistics

36.3 Generalized linear models

The generalized linear model framework \[\mathbf{y} \sim[\mathbf{\mathbf{y}|\boldsymbol{\mu},\psi]}\] \[g(\boldsymbol{\mu})=\mathbf{X}\boldsymbol{\beta}\]
- \([\mathbf{y}|\boldsymbol{\mu},\psi]\) is a distribution from the exponential family (e.g., normal, Poisson, binomial) with expected value \(\boldsymbol{\mu}\) and dispersion parameter \(\psi\).
  - Example \([\mathbf{y}|\boldsymbol{\mu},\psi]=\text{N}\left(\boldsymbol{\mu},\psi\mathbf{I}\right)\)
  - Example \([\mathbf{\mathbf{y}|\boldsymbol{\mu},\psi]}=\text{Poisson}\left(\boldsymbol{\mu}\right)\)
  - Example \([\mathbf{\mathbf{y}|\boldsymbol{\mu},\psi]}=\text{gamma}\left(\boldsymbol{\mu},\psi\right)\)

Estimation

Use numerical maximization to estimate
\(\hat{\boldsymbol{\beta}}=(\mathbf{X}^{\prime}\mathbf{X})^{-1}\mathbf{X}^{\prime}\mathbf{y}\) is NOT the MLE for the glm
Inference is based on asymptotic properties of MLE.
Example: Challenger data
- The statistical model: \[\mathbf{y}\sim\text{binomial}(6,\mathbf{p})\] \[\text{logit}(\mathbf{p})=\mathbf{X}\boldsymbol{\beta}.\]
- The PMF for the binomial distribution \[\binom{N}{y}p^{y}(1-p)^{N-y}\]
- Write out the likelihood function \[\mathcal{L}(\boldsymbol{\beta})=\prod_{i=1}^n\binom{6}{y_i}p_i^{y_i}(1-p_i)^{6-y_i}\]
Maximize the log-likelihood function

challenger <- read.csv("https://www.dropbox.com/s/ezxj8d48uh7lzhr/challenger.csv?dl=1")

plot(challenger$Temp,challenger$O.ring,xlab="Temperature",ylab="Number of incidents",xlim=c(10,80),ylim=c(0,6))

y <- challenger$O.ring
X <- model.matrix(~Temp,data= challenger)

nll <- function(beta){
  p <- 1/(1+exp(-X%*%beta)) # Inverse of logit link function
  -sum(dbinom(y,6,p,log=TRUE)) # Sum of the log likelihood for each observation 
  }
est <- optim(par=c(0,0),fn=nll,hessian=TRUE)

# MLE of beta
est$par

## [1]  6.751192 -0.139703

# Var(beta.hat)
diag(solve(est$hessian))

## [1] 8.830866682 0.002145791

# Std.Error
diag(solve(est$hessian))^0.5

## [1] 2.97167742 0.04632268

Using the glm() function

m1 <- glm(cbind(challenger$O.ring,6-challenger$O.ring) ~ Temp, family = binomial(link = "logit"),data=challenger)
summary(m1)

## 
## Call:
## glm(formula = cbind(challenger$O.ring, 6 - challenger$O.ring) ~ 
##     Temp, family = binomial(link = "logit"), data = challenger)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  6.75183    2.97989   2.266  0.02346 * 
## Temp        -0.13971    0.04647  -3.007  0.00264 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 28.761  on 22  degrees of freedom
## Residual deviance: 19.093  on 21  degrees of freedom
## AIC: 36.757
## 
## Number of Fisher Scoring iterations: 5

Prediction
- We can get prediction using \(\mathbf{X}\hat{\boldsymbol{\beta}}\) or the inverse of the logit link function \(\hat{\boldsymbol{\mu}}=\frac{1}{1+e^{-\mathbf{X}\hat{\boldsymbol{\beta}}}}\)
- Confidence intervals for \(\mathbf{X}\hat{\boldsymbol{\beta}}\) are straightforward
- Confidence intervals for \(\hat{\boldsymbol{\mu}}=\frac{1}{1+e^{-\mathbf{X}\hat{\boldsymbol{\beta}}}}\) are not straightforward because of the nonlinear transformation of \(\hat{\boldsymbol{\beta}}\)

beta.hat <- coef(m1)

new.temp <- seq(0,80,by=0.01)
X.new <- model.matrix(~new.temp)

mu.hat <- (1/(1+exp(-X.new%*%beta.hat)))

plot(challenger$Temp,challenger$O.ring,xlab="Temperature",ylab="Number of incidents",xlim=c(10,80),ylim=c(0,6))
points(new.temp,6*mu.hat,typ="l",col="red")

Predictive distribution

library(mosaic)

predict.bs <- function() {
    m1 <- glm(cbind(O.ring, 6 - O.ring) ~ Temp, family = binomial(link = "logit"),
        data = resample(challenger))
    p <- predict(m1, newdata = data.frame(Temp = 31), type = "response")
    y <- rbinom(1, 6, p)
    y
}

bootstrap <- do(1000) * predict.bs()

par(mar = c(5, 4, 7, 2))
hist(bootstrap[, 1], col = "grey", xlab = "Year", main = "Empirical distribution of O-ring failures at 31 degrees",
    freq = FALSE, right = FALSE, breaks = c(-0.5:6.5))

# 95% equal-tailed CI based on percentiles of the empirical distribution
quantile(bootstrap[, 1], prob = c(0.025, 0.975))

##  2.5% 97.5% 
##     0     6

# Proability of 6 O-rings failing
length(which(bootstrap[, 1] == 6))/dim(bootstrap)[1]

## [1] 0.537

Summary
- The generalized linear model useful when the distributional assumptions of linear model are untenable.
  - Small counts
  - Binary data
- Two cultures
  - All distributions are approximations of a unknown data generating process so why does it matter? Use the linear model because it is easier to understand.
  - Personal philosophy
    - Picking a distribution whose support matches that of the data gets closer to the etiology of the process that generated the data.
  - Easier for me to think scientifically about the process