Module 1C Random Intercepts and Slopes

From Random Intercepts to Random Slopes:

Consider the mixed effects model with a vector of predictors $x_{ij}$:

\[ y_{ij} = (\beta_0 + \theta_{0i}) + (\beta_1 + \theta_{1i}) x_{ij} + \epsilon_{ij}, \quad \epsilon_{ij} \overset{iid}{\sim} N(0, \sigma^2) \]

$_{0i} = 0 + {0i} $ participant specific intercept or child specific average.
$\beta_{1i} = \beta_1 + \theta_{1i}$ participant specific rate of change or slope.
$\beta_0, \beta_1$ the intercept and slope that does not vary across participants.
$\sigma^2$ within subject variability ie measurement error.

Random Effects Covariane Structure

The random effects are modeled jointly as:

\[ \begin{pmatrix}\theta_{0i} \\ \theta_{1i} \end{pmatrix} \sim N \left( \begin{pmatrix} 0 \\ 0 \end{pmatrix}, \begin{pmatrix} \tau^2_0 & \rho \tau_0 \tau_1 \\ \rho \tau_0 \tau_1 & \tau_1^2 \end{pmatrix} \right) \] where:

$\tau_0^2$ is the variance of random intercepts,
$\tau_1^2$ is the variance of random slopes,
$\rho$ is the correlation between intercepts and slopes.

This structure allows us to model not just variation in starting points (intercepts) and rates of change (slopes), but also how they are related (e.g., do groups with higher baselines tend to change faster?).

Fitting the Random Intercept and Slope model in R

Lets fit the model using the tlc data:

library(ggplot2)
library(dplyr)
library(gridExtra)


Attaching package: 'gridExtra'

The following object is masked from 'package:dplyr':

    combine

library(lme4)

Loading required package: Matrix

tlcdata <- readRDS("data/tlc.RDS")

placebo_data = tlcdata %>% filter(treatment == "placebo")
fit_random <- lmer(lead ~ week + (week|id), data = placebo_data)
summary(fit_random)

Linear mixed model fit by REML ['lmerMod']
Formula: lead ~ week + (week | id)
   Data: placebo_data

REML criterion at convergence: 1053.3

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.20808 -0.49837 -0.02268  0.47100  2.30541 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 id       (Intercept) 23.3113  4.8282       
          week         0.1581  0.3976   0.02
 Residual              4.5016  2.1217       
Number of obs: 200, groups:  id, 50

Fixed effects:
            Estimate Std. Error t value
(Intercept) 25.68536    0.72018  35.665
week        -0.37213    0.08437  -4.411

Correlation of Fixed Effects:
     (Intr)
week -0.164

Interpretation:

Child-specific intercept: $(\beta_0 + \theta_{0i}) \sim N(25.68, 23.31)$.
Child-specific slope: $(\beta_1 + \theta_{1i}) \sim N(-0.372, 0.158)$.
The correlation $rho$ between $(\beta_{0i} \beta_{1i}) = 0.02$.
The 95% CIs of the subject-specific slopes does not include zero in the distribution. This we reject the null that a child’s lead levels do not increase with time.

library(ggplot2)
ggplot(placebo_data, aes(x = week, y = lead, group = id)) +
  geom_point(alpha = 0.5) +
  geom_line(aes(y = predict(fit_random)), color = "blue") +
  labs(title = "Random Intercepts and Slopes by Individual",
       x = "Week", y = "Lead Level")

Comparing Models

Is the random intercept plus slope preferred over random intercept only?
AIC: Akaike’s Information Criterion.
AIC is often used in model selection. It is a popular approach to choosing which variables should be in the model. \[ AIC = -2l(\mathbf{\theta}) + 2p \] where $l(\mathbf{\theta})$ is the log-likelihood for all parameters $\mathbf{\theta}$ and $p$ is the number of parameters.
The MLE estimate of the variances is biased! (Similar to the estimate in simple linear regression).
The general rule of thumb is a differenc of 2+ is substantially better. A lower AIC is better.
The AIC uses the maximum likelihood so use $REML=FALSE$ to set the parameter estimation to MLE.