Module 1A Linear Mixed Models

Reading

Overview

In this module, you’ll learn how to model data with hierarchical or correlated structures using linear mixed models (LMMs). These models extend standard regression to handle repeated measures or nested data structures, commonly found in longitudinal and multilevel public health data.

After completing this module, you will be able to:

  • Identify when LMMs are appropriate

  • Distinguish between fixed and random effects

  • Fit and interpret random intercept and random slope models in R

  • Understand concepts like shrinkage, intraclass correlation, and BLUPs

  • Hierarchical formulation of random effects models

Motivation and Examples

Motivation:

Correlated data structures arise when multiple observations are taken from the same unit or group: Examples include:

  • Longitudinal data: Repeated measures data on each subject
    • \(y_{it}\) denotes the measure at time \(t\) for unit \(i\).
  • Clustered or multilevel data: Observations within nested group structures.
    • A collection of units from a population of similar units, e.g., \(y_{ij}\) denotes measure from person \(i\) in demographic group \(j\).

Ignoring this structure can lead to biased estimates and invalid inference.

Motivating Example: The TLC Study:

The Treatment of Lead-Exposed Children (TLC) was examined using a placebo-controlled randomized study of succimer (a chelating agent) in children with blood lead levels of 20-44 micrograms/dL. These data consist of four repeated measurements of blood lead levels obtained at baseline (or week 0) , week 1, week 4, and week 6 on 100 children who were randomly assigned to chelation treament with succimer or placebo.

Variable Description
id child identifier
treatment succimer or placebo
week measurement week
lead lead level (outcome)
tlcdata <- readRDS("data/tlc.RDS")
head(tlcdata)
    id treatment week lead
1    1   placebo    0 30.8
101  1   placebo    1 26.9
201  1   placebo    4 25.8
301  1   placebo    6 23.8
2    2  succimer    0 26.5
102  2  succimer    1 14.8

Why Not OLS? The wrong approach

If we ignore clustering and fit the standard linear regression (OLS): \[ \begin{align*} y_{ij}& = \beta_0 + \hat{\beta}_1 x_{ij} + \epsilon_{ij}\\ \epsilon_{ij} & \overset{iid}{\sim} N(0, \sigma^2) \end{align*} \] we assume that all observations are independent.

But repeated measurements from the same child are not independent.

fit.lm = lm(lead ~ week, data = tlcdata)
summary(fit.lm)

Call:
lm(formula = lead ~ week, data = tlcdata)

Residuals:
    Min      1Q  Median      3Q     Max 
-19.775  -3.749   0.328   4.454  43.330 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  22.9760     0.6079  37.793   <2e-16 ***
week         -0.4010     0.1670  -2.401   0.0168 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.966 on 398 degrees of freedom
Multiple R-squared:  0.01428,   Adjusted R-squared:  0.0118 
F-statistic: 5.765 on 1 and 398 DF,  p-value: 0.01681

What assumptions of OLS are we violating?

  • \(\hat{\beta}_1\) is an unbiased and consistent estimator of \(\beta_1\).

  • Errors are not independent which leads to incorrect standard error estimates.

  • \(\sigma^2\) conflates within and between pig variability.

  • Limitations: You cannot forecast individual trajectories.

The Better Approach: Introducing Linear Mixed Models

LMMs are extensions of regression in which intercepts and/or slopes are allowed to vary across individual units. They were developed to deal with correlated and/or nested data.

Taking our motivating example, we can assign a separate intercept for each child:

\[ \begin{align*} y_{ij} &= \beta_{0i} + \beta_1 x_{ij} + \epsilon_{ij}, \\ \epsilon_{ij} & \overset{iid}{\sim} N(0, \sigma^2) \end{align*} \]

Interpretation:

  • \(\beta_{0i}\) is the child-specific weight at zero and \(\beta_1\) is the constant slope.
  • \(\sigma^2\) captures within child variability.
  • The mix of random and fixed effects = mixed modeling.
Effect Type Definition Example
Fixed Population-average effects Effect of week on lead
Random Child-specific deviation Child-specific intercept \(\beta_{0i}\)
Random Child-week deviation Child-week error \(\epsilon_{ij}\) |

Random Intercept Model

Consider the random intercept model with a vector of predictors \(x_{ij}\):

\[ \begin{align*} y_{ij} &=\mu + \theta_i + x_{ij}^T \beta + \epsilon_{ij}\\ \epsilon_{ij} & \overset{iid}{\sim} N(0, \sigma^2)\\ \theta_i & \overset{iid}{\sim} N(0, \tau^2) \\ \theta_i& \perp \epsilon_{ij} \end{align*} \]

  • \(\mu\) = overall intercept (population mean when \(x_{ij}=0\)).

  • \(\theta_i\) = child specific difference from \(\mu\).

  • \(\beta_{0i} = \mu + \theta_i\) participant specific intercept or child specific average.

  • \(\beta\) = time coefficient that does not vary across participants.

  • \(\tau^2\) = between participant variability.

  • \(\sigma^2\) = residual variance or within group variability “measurement error”.

Properties of the Random Intercept Model

Use the expandable panels below to explore key statistical properties and their derivations.

Fitting a mixed effects model with a random intercept in R:

library(lmerTest)
Loading required package: lme4
Loading required package: Matrix

Attaching package: 'lmerTest'
The following object is masked from 'package:lme4':

    lmer
The following object is masked from 'package:stats':

    step
fit.randomeffects = lmer(lead ~ week + (1|id), data = tlcdata)
summary(fit.randomeffects)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: lead ~ week + (1 | id)
   Data: tlcdata

REML criterion at convergence: 2695.1

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.6337 -0.3461  0.0601  0.4131  6.1167 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 29.63    5.444   
 Residual             33.98    5.829   
Number of obs: 400, groups:  id, 100

Fixed effects:
            Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)  22.9760     0.7030 161.6442  32.683  < 2e-16 ***
week         -0.4010     0.1222 299.0000  -3.281  0.00116 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
     (Intr)
week -0.478

Model Interpretation: The fixed effect captures the global or average change in lead level for every one unit change in time, i.e., from week 1 to week 2, \(\hat{\beta}_1 = -0.410\), meaning that there is an average 0.4 decrease in lead for every unit increase in week which was found to be statistically significant \(p<0.01\).

Now the intercept 22.97 corresponds to the population-average lead at week 0.

The SE of the slope \(se(\hat{\beta_1}) = 0.122\) capturing uncertainty associated with the estimate.

The random effect of child \(\beta_{0i}\) allows for child-specific random intercepts, where the variance of the intercepts is 29.63.

Covariance Structure and Matrix Form

A random intercept model is also known as a two-level variance component model:

  • \(\theta_i\) is a random intercept shared by all observations in group \(i\) and has a “intercept-specific” error \(\tau^2\).

  • \(\epsilon_{ij}\) is the individual-level residual error.

Derivation: \(\text{Cov}(y_{ij}, y_{ik})\) for \(j \ne k\)

We now derive the covariance between two observations from the same person \(i\). Let’s expand both observations:

\[ \begin{aligned} y_{ij} &= \mu + \theta_i + \beta x_{ij} + \epsilon_{ij} \\ y_{ik} &= \mu + \theta_i + \beta x_{ik} + \epsilon_{ik} \end{aligned} \]

Then,

\[ \text{Cov}(y_{ij}, y_{ik}) = \text{Cov}(\theta_i + \epsilon_{ij},\ \theta_i + \epsilon_{ik}) \]

Using bilinearity of covariance:

\[ = \text{Cov}(\theta_i, \theta_i) + \text{Cov}(\theta_i, \epsilon_{ik}) + \text{Cov}(\epsilon_{ij}, \theta_i) + \text{Cov}(\epsilon_{ij}, \epsilon_{ik}) \]

Now plug in the assumptions:

  • \(\text{Var}(\theta_i) = \tau^2\)
  • \(\text{Cov}(\theta_i, \epsilon_{ij}) = 0\)
  • \(\text{Cov}(\epsilon_{ij}, \epsilon_{ik}) = 0\) for \(j \ne k\)

So:

\[ \text{Cov}(y_{ij}, y_{ik}) = \tau^2\quad ✅ \] Observations from the same group/cluster share the same random intercept \(\theta_i\), which introduces positive correlation.


Derivation: \(\text{Var}(y_{ij})\)

Total variance = between-group variance + within-group variance. \[ \text{Var}(y_{ij}) = \text{Var}(\theta_i + \epsilon_{ij}) = \tau^2 + \sigma^2\quad ✅ \]


Covariance Matrix

Suppose we observe two individuals, each measured at three time points:

  • Subject 1: \(y_{11}, y_{12}, y_{13}\)
  • Subject 2: \(y_{21}, y_{22}, y_{23}\)

The marginal covariance matrix \(\text{Cov}(\mathbf{y})\) for the stacked vector:

\[ \mathbf{y} = \begin{bmatrix} y_{11} \\ y_{12} \\ y_{13} \\ y_{21} \\ y_{22} \\ y_{23} \end{bmatrix} \]

is:

\[ \text{Cov}(\mathbf{y}) = \begin{bmatrix} \tau^2 + \sigma^2 & \tau^2 & \tau^2 & 0 & 0 & 0 \\ \tau^2 & \tau^2 + \sigma^2 & \tau^2 & 0 & 0 & 0 \\ \tau^2 & \tau^2 & \tau^2 + \sigma^2 & 0 & 0 & 0 \\ 0 & 0 & 0 & \tau^2 + \sigma^2 & \tau^2 & \tau^2 \\ 0 & 0 & 0 & \tau^2 & \tau^2 + \sigma^2 & \tau^2 \\ 0 & 0 & 0 & \tau^2 & \tau^2 & \tau^2 + \sigma^2 \\ \end{bmatrix} \]

  • Within a subject, all pairs of outcomes have covariance \(\tau^2\).
  • Between subjects, outcomes are independent (covariance = 0).
  • Diagonal entries reflect total variance: \(\tau^2 + \sigma^2\).

This structure is known as compound symmetry within each subject (constant pairwise correlation), and block-diagonal across subjects.

Random Intercept Model in Matrix Form

Consider the mixed model with random intercepts for \(n\) groups and define \(N = \sum_{i=1}^n r_i\).

\[ \mathbf{y} = \mathbf{Z} \mathbf{\theta} + \mathbf{X}\mathbf{\beta} + \mathbf{\epsilon} \]

where

  • \(\mathbf{y} = N \times 1\) vector of response

  • \(\mathbf{Z} = N \times n\) design matrix of indicator variables for each group.

  • \(\mathbf{\theta} = n \times 1\) vector of random intercepts.

  • \(\mathbf{X} = N\times p\) design matrix of fixed effects (including overall intercept).

  • \(\mathbf{\beta} = p\times 1\) vector of fixed effects.

  • \(\mathbf{\epsilon} = N \times 1\) vector of residual error. :::

Intraclass Correlation

  • Note that the within-group covariance is: \(Cov(y_{ij}, y_{ij'}) = \tau^2\) .

  • So the correlation between observations within within the same group is

\[ \rho = Corr(y_{ij}, y_{ij'}) = \frac{\tau^2}{\tau^2 + \sigma^2} \text{ for all }j \neq j'. \]

  • The value \(\rho\) is often called the intraclass correlation. It measures the degree of similarity among same-group observations compared to the residual error \(\sigma^2\).
  • Describes how strongly units in the same group resemble eachother.
  • \(\rho \rightarrow 0\) when \(\tau^2 \rightarrow 0\) (i.e. same intercept)
  • \(\rho \rightarrow 0\) when \(\sigma^2 \rightarrow \infty\) (i.e., growing measurement error).
  • \(\rho \rightarrow 1\) when \(\tau^2 \rightarrow \infty\) (i.e., large separation in intercepts).
  • \(\rho \rightarrow 1\) when \(\sigma^2 \rightarrow 0\) (i.e., zero measurement error).
  • The above ICC has an exchangeable structure because the correlation is constant between any pair of within-group observations.

Simulation Varying Between Subject Variability

Simulation Varying Sources of Error

Summary

Section Description
What is the model? - In the random intercept model: \(y_{ij} = \mu + \theta_i + x_{ij}'\beta + \epsilon_{ij} = \mathbf{Z} \theta + \mathbf{X} \beta + \mathbf{e}\)
- \(\mu\) is the population mean (overall average).
- \(\theta_i\) are individual differences in intercepts from \(\mu\).
Model Assumptions - \(\epsilon_{ij} \sim N(0, \sigma^2)\)
- \(Var(\epsilon) = \sigma^2\) (within-subject variance)
- \(\theta_i \sim N(0, \tau^2)\)
- \(\theta_i \perp \epsilon_{ij}\)
- \(\rho = \frac{\tau^2}{\tau^2 + \sigma^2}\) (ICC).
Interpretation - \(\theta_i\) = individual deviation from \(\mu\).
- \(\beta_1\) = global effect of \(X\).
- Total variance = \(\sigma^2 + \tau^2\).
- ICC: degree of similarity among same-group observations relative to residual error.