To simplify the derivation first consider a random effects model without fixed effects:
\[
y_{ij} = \theta_i + \epsilon_{ij}, \quad \theta_i \overset{iid}{\sim} N(0, \tau^2), \quad \epsilon_{ij} \overset{iid}{\sim} N(0, \sigma^2)
\] - The join density of the data and random effects is given by:
The estimated random intercept \(\hat{\theta}_i\) can be expressed as a weighted combination of the overall mean \(\mu\) and the group-specific mean \(\bar{y}_i\):
As \(\tau^2 \to 0\), all \(\hat{\theta}_i \to \mu\) (complete shrinkage toward the common mean).
As \(\tau^2 \to \infty\), \(\hat{\theta}_i \to \bar{y}_i\) (no shrinkage, group means are used as-is).
The amount of shrinkage depends jointly on \(\tau^2\), \(\sigma^2\), and \(n_i\).
Parameter Estimation
Theorem
Let \(Y=X\beta + \epsilon\) and \(E(Y) = X\beta\) and \(cov(Y) = \sigma^2 V\) where \(X\) is full rank and \(V\) is a known positive definite matrix. The best linear unbiased estimator (BLUE) of \(\beta\) is given by:
The value of \(\beta\) that maximizes the likelihood function is the GLS estimate, i.e., \(\hat{\beta}_{MLE} = \hat{\beta}_{GLS}\)
To estimate the fixed effects \(\boldsymbol{\beta}\) and the variance components \((\sigma^2, \tau^2)\), we use the marginal model:
\[
\mathbf{y} \sim N(\mathbf{X}\boldsymbol{\beta}, \mathbf{V}),
\] where
\[
\mathbf{V} = \mathbf{Z} G \mathbf{Z}^T + R
\] captures both the between-group (\(G = \tau^2 I\)) and within-group (\(R = \sigma^2 I\)) variability.
In ordinary linear regression \(Y = X \beta + \epsilon\) with \(\epsilon \sim N(0, \sigma^2 I)\), the OLS estimator
\[
\hat{\beta}_{OLS} = (X^T X)^{-1} X^T y
\]
is optimal because the errors are uncorrelated and have constant variance.
In LMMs, the data are correlated: \[
\text{Cov}(\mathbf{y}) = \mathbf{V} \neq \sigma^2 I.
\] The optimal estimator becomes the generalized least squares (GLS) estimator:
The MLS estimate of the variance component \(\hat{\sigma}^2_{MLE}\) is biased!! Why because if we take the expectation (e.g., in the univariate case) \(E(\hat{\sigma}^2_{MLE}) = \frac{n-1}{n} \sigma^2 \neq \sigma^2\).
It is important to note when we calculate the MLE, we use \(\hat{y}\). If we used \(\mu\) instead of \(\hat{y}\) , it is not longer biased. The intuition behind this is we used some information on estimating the mean and we need to account for this in estimating the spread around the mean.
The idea behind REML is we want to remove information about the mean from the likelihood and maximize what remains to get an estimate of the REML variance \(\hat{V}_{REML}\) that does not depend on the mean.
The REML estimator is equivalent to the MLE estimated based on transformed outcomes \(Y^* = AY\) such that the distribution of \(Y^*\) does not depend on \(\beta\).
The MLE estimates of the variance components \((\sigma^2, \tau^2)\) are biased, much like the naive variance estimate in OLS: \[
\hat{\sigma}^2_{MLE} = \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2.
\]
Restricted Maximum Likelihood (REML) corrects this bias by accounting for the degrees of freedom used to estimate \(\boldsymbol{\beta}\): \[
\hat{\sigma}^2_{REML} = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2.
\]
REML is preferred in most applications and is the default in lme4::lmer() in R.
Important: variance is now included in the estimator. This is different from \(\hat{\beta}_{MLE} = (X'X)^{-1} X'y\) where the variance does not appear. This means when outcomes are correlated the variance matters in estimating \(\beta\). –>
library(nlme)
Attaching package: 'nlme'
The following object is masked from 'package:dplyr':
collapse
library(ggplot2)library(dplyr)library(gridExtra)
Attaching package: 'gridExtra'
The following object is masked from 'package:dplyr':
combine
library(lme4)
Loading required package: Matrix
Attaching package: 'lme4'
The following object is masked from 'package:nlme':
lmList
tlcdata <-readRDS("data/tlc.RDS")succimer_data_only <- tlcdata %>%filter(treatment =="succimer")tlc.gls <-gls(lead ~relevel(as.factor(week), 4), data = succimer_data_only, #only running for treatment group correlation =corSymm(form =~1|id), #within correlation by individual weights =varIdent(form =~1|week)) #weights are determined across time summary(tlc.gls)
Generalized least squares fit by REML
Model: lead ~ relevel(as.factor(week), 4)
Data: succimer_data_only
AIC BIC logLik
1308.337 1354.231 -640.1687
Correlation Structure: General
Formula: ~1 | id
Parameter estimate(s):
Correlation:
1 2 3
2 0.401
3 0.384 0.731
4 0.495 0.507 0.455
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | week
Parameter estimates:
0 1 4 6
1.000000 1.528082 1.563878 1.841541
Coefficients:
Value Std.Error t-value p-value
(Intercept) 20.762 1.307627 15.877618 0e+00
relevel(as.factor(week), 4)0 5.778 1.137826 5.078104 0e+00
relevel(as.factor(week), 4)1 -7.240 1.203585 -6.015361 0e+00
relevel(as.factor(week), 4)4 -5.248 1.273641 -4.120470 1e-04
Correlation:
(Intr) r(.(),4)0 r(.(),4)1
relevel(as.factor(week), 4)0 -0.840
relevel(as.factor(week), 4)1 -0.629 0.614
relevel(as.factor(week), 4)4 -0.630 0.616 0.790
Standardized residuals:
Min Q1 Med Q3 Max
-1.8020146 -0.8258012 -0.1227684 0.5184783 4.6654246
Residual standard error: 5.020968
Degrees of freedom: 200 total; 196 residual
Suppose we want the covariance matrix?
# Get covariance matrix for individual i. getVarCov(tlc.gls, individual =3)
#To get the variance weights summary(tlc.gls$modelStruct$varStruct)
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | week
Parameter estimates:
0 1 4 6
1.000000 1.528082 1.563878 1.841541
Summary
Section
Description
What is the model?
Random intercept model with hierarchical structure: \(y_{ij} = \mu + \theta_i + \epsilon_{ij}\), where \(\theta_i \sim N(0, \tau^2)\) and \(\epsilon_{ij} \sim N(0, \sigma^2)\). The data’s covariance is \(\mathbf{V} = \mathbf{Z}G\mathbf{Z}^T + R\), with \(G = \tau^2 I\) (between-group) and \(R = \sigma^2 I\) (within-group).
BLUP of\(\theta_i\)
Best Linear Unbiased Prediction: \(\hat{\theta}_i = B_i \bar{y}_i\) with shrinkage factor \(B_i = \frac{n_i}{n_i + \sigma^2/\tau^2}\). Small \(n_i\) or small \(\tau^2\) leads to more shrinkage toward \(\mu\), while large \(n_i\) or large \(\tau^2\) reduces shrinkage.
Borrowing of Information
\(\hat{\theta}_i = \frac{(1/\tau^2)\mu + (n_i/\sigma^2)\bar{y}_i}{1/\tau^2 + n_i/\sigma^2}\). Groups with fewer observations are pulled toward the overall mean \(\mu\), while larger groups rely more on their own group mean \(\bar{y}_i\).
Parameter Estimation
Fixed effects \(\beta\) estimated using GLS: \(\hat{\beta}_{GLS} = (X^T V^{-1} X)^{-1} X^T V^{-1} y\). Variance components \((\sigma^2, \tau^2)\) estimated by MLE or REML.
Variance Components
MLE of variance components is biased (like naive OLS variance). REML adjusts for the loss of degrees of freedom when estimating \(\beta\) and provides unbiased estimates of \(\sigma^2\) and \(\tau^2\).
Interpretation
Total variance = \(\sigma^2 + \tau^2\). Intraclass correlation: \(\rho = \frac{\tau^2}{\tau^2 + \sigma^2}\). The magnitude of \(\rho\) reflects within-group similarity relative to residual variation.