6.4 Linear Mixed models: Theory

Linear Mixed Models are used for cases where your observations are clustered. This means that certain groups of your observations share something in common other than the values of the predictor variables and covariates. The most common use of the Linear Mixed Model that you are likely to encounter is for studies where there are repeated measures from the same individual, like a longitudinal study or an experiment with an IV manipulated within subjects. This is not the only application however: there are many other possible sources of clustering in datasets, and you can use Linear Mixed Models in these cases too.

In the case of repeated measures from the same individuals, you know that all the observations from participant 1 share something in common (even if they involved different levels of the IV). What they share in common is that they all came from the same person. Maybe that person has a characteristic way of doing the task. Maybe they always get rather high scores, or always rather low ones. You want to account for this non-independence of sets of observations in your model. Accounting for it will have two advantages.

First, it will help you avoid over-weighting observations from the same cluster for between-cluster comparisons. Imagine you have an experiment on music learning where the scores in group A are 3, 5, 5, 7, 8, 8 and the scores in group B are 4, 5, 5, 17, 16, 19. This looks like you might have a between-group difference (the means of the two groups are certainly very different). But if I now tell you that the 17, the 16 and the 19 all came from the same participant doing the task three times, and that participant was a professional musician, you will not be so impressed. Essentially, you will say, the 17, the 16 and the 19 should count as one observation since they are not independent of one another. The Linear Mixed Model will help us not overweight these kinds of non-independent observations in between-group comparisons, by identifying the three high scores as instances of the same cluster within group B rather than three independent cases of scoring highly in group B.

Second, the Linear Mixed Model will help us avoid under-interpreting small effects when they occur within clusters. Imagine we have a dataset where participants A, B, C, D, and E all do a memory task twice, once with and once without a cognitive enhancing drug. Their scores are as follows:

Participant	No drug	Drug
A	3	5
B	9	10
C	18	19
D	20	22
E	28	29

Here, if we just compare the averages of the groups without considering the clustering by participant, we could easily conclude that the drug does not do much. The means of the two groups only differ by 1.4, which seems a trivial amount when you consider the huge amount of variation in performance there is on this task (the scores range from 3 to 29).

However, when you look at the data set out in the table, it’s clear that most of the variation is between participants: some participants always get high scores (28 and 29) and some always get low (3 and 5). But looking within a participant, you can see that every participant does a bit better in the drug than the no-drug condition. A more precise estimate of the ACE of the drug on performance is each participant’s performance in the drug condition relative to their own baseline in the no-drug condition (not the average performance of all the participants in the no-drug condition). The Linear Mixed Model will estimate this, exactly by specifying which observations belong to the same cluster (that is, in this case, the same participant).

The Linear Mixed Model works by introducing into our familiar General Linear Model equation additional terms specifying what are called the random effects. This is a misleading term unless you are a mathematician, since the random effect terms are not generated randomly. What the random effects terms do is identify which cases belong to the same cluster as which others. The model fitting then estimates a coefficient (or coefficients, in complex cases) for each cluster. That coefficient accounts for the fact that different clusters have different means. When the clusters are participants, this ensures that any within-subjects differences will be interpreted as departures from that participant’s own baseline, and for between-subjects variables, multiple observations from the same participant will not get over-weighted.

All of this is much easier to demonstrate in practice than it is to explain theoretically, so let’s work through an example.