3.3 General Linear Models with a single predictor
The referendum opinion poll example was very simple: the parameter we wanted to estimate was just a proportion the population. In research, the parameters you want to estimate are, more often, the effects of a change in some IV or predictor variable or some DV or outcome variable. That’s the kind of parameter you find in the General Linear Model. You want to be able to apply a General Linear Model to real data sets, so in this section, we will work again on the behavioural inhibition dataset we met in chapter 2.5.
3.3.1 Loading in the behavioural inhibition data, again
First we are going to load the data in, as we previously did. Your script to do this should like this:
# Script to analyse behavioural inhibition data
# Load up tidyverse
library(tidyverse)
# Read in the data
d <- read_csv("https://bit.ly/inhibitiondata")
# Rename the first column
colnames(d)[1] <- "Participant"
# Recode the Condition variable nicer
d <- d %>% mutate(Condition = case_when(Mood_induction_condition == 1 ~ "Negative", Mood_induction_condition == 2 ~ "Neutral"))
Run this - you should have a data frame d
in your environment, with 58 observations of 14 variables.
3.3.2 A first parameter estimate
The experimental prediction in (Paál et al., 2015) was about the difference in SSRT between people in the negative and neutral conditions. (We are leaving aside the other predictions, about socioeconomic deprivation and age, at this point, we will return to them.) So, let us set up a model of this situation.
Let’s say that in the population, there is some average SSRT of people who are in neutral moods. We can represent this parameter with the symbol, \(\beta_0\). So, in the population, the average SSRT of people in neutral moods is as follows:
\[E(SSRT_{neutral}) = \beta_0 \] What about people in negative moods? Their average SSRT is going to differ from the average SSRT of people in neutral moods by some amount, which we can capture with the parameter \(\beta_1\). We are not prejudging the question of whether SSRT is higher, lower, or the same for people in negative moods as compared with neutral. moods. \(\beta_1\) might turn out to be equal to 0, in which case mood makes no difference to SSRT. But \(\beta_1\) might also turn out to be different from zero, in either direction. Under our model, then, the average SSRT of people in negative moods will be given by:
\[ E(SSRT_{negative}) = \beta_0 + \beta_1 \]
Putting this together, we can say that the expected value of someone’s SSRT is going to be:
\[ E(SSRT) = \beta_0 + Condition * \beta_1 \] Here, \(Condition\) represents a variable that takes the value 0 if their mood is neutral, and 1 if their mood has been made negative. What we want from our model, of course, is estimates of \(\beta_0\) and \(\beta_1\), plus precision for those estimates. It turns out (again I won’t go into the maths), that the best possible estimate I can make of \(\beta_0\) is the mean SSRT in the neutral condition of my sample; and the best possible estimate I can make of \(\beta_1\) is the difference in mean SSRTs between the neutral and negative conditions of my sample.
Let’s now see how this works by fitting a General Linear Model to the data. We do this with the R function lm()
.
This says, fit a General Linear Model to the data in data frame d
, in which the variable SSRT
is predicted by the variable Condition
; then assign this model to the object m1
. You could call the model something else if you like; it’s up to you. Also, if you have named your data frame something other than d
, then you will need to modify the lm()
call appropriately.
Now we have our model object (it should have appeared in your Environment window), let us see what it contains. We do this with the function summary()
.
##
## Call:
## lm(formula = SSRT ~ Condition, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -112.19 -23.37 -0.17 25.86 150.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 242.09 8.06 30.04 <2e-16 ***
## ConditionNeutral -7.57 11.60 -0.65 0.52
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 44.1 on 56 degrees of freedom
## Multiple R-squared: 0.00754, Adjusted R-squared: -0.0102
## F-statistic: 0.426 on 1 and 56 DF, p-value: 0.517
What does this summary tell us? There is a parameter estimate called Intercept
, and one that estimates the effect of Condition
being Neutral
rather than Negative
(-7.569). This is an inconvenient way around to put it. The way I set up the example before, I was treating SSRT in Neutral
as the baseline case, and SSRT in Negative
the departure from this. But, because Negative
comes before Neutral
in alphabetical order, R has taken Negative
as the baseline. Before we go any further, let’s fix this. Run the following line:
This says, treat Condition
as a factor (a qualitative variable with ordered levels) and specify the order of the levels as Neutral
first and Negative
second (this is also called setting Neutral
as the reference category). Now let’s fit and summarise our model again. To save space, I will just show the coefficients
part of the summary.
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 234.52 8.34 28.110 1.05e-34
## ConditionNegative 7.57 11.60 0.652 5.17e-01
Now Intercept
represents an estimate of \(\beta_0\) as we originally defined it (i.e. average SSRT in neutral mood), and ConditionNegative
represents \(\beta_1\), the difference in average SSRT when mood is negative instead of neutral. So, interpreting the first column, \(\beta_0\) is about 235 msec, and \(\beta_1\) about 8 msec. SSRT is estimated as a bit higher in negative mood, but only a tiny bit (8 msec more on a baseline of 235).
How do these numbers come from the raw data? Let’s get the key descriptive statistics again.
## # A tibble: 2 × 2
## Condition M
## <fct> <dbl>
## 1 Neutral 234.5
## 2 Negative 242.1
We can see that the Intercept
, the estimate of \(\beta_0\), is just the mean SSRT in the neutral condition, about 235; and the estimate of \(\beta_1\) is just the difference in mean SSRTs between the two conditions (about 8 after rounding).
Make sure you are clear on everything to this point. This is important stuff!
3.3.3 Bringing in imprecision
Now we need to get a sense from our model of how precise our parameter estimates are. The standard error of the estimates is reported in the second column of the model summary entitled Std. Error
. Here, for \(\beta_1\), the standard error is about 11, whereas the estimate itself is only about 8. In other words, we estimate \(\beta_1\) as 8 msec (SSRTs are 8 msec longer for people in bad moods in the population), but we acknowledging that the typical error is of this estimate is 11. In other words, the true value could easily be 8 + 11 = 19, and could easily be 8 - 11 = -3. Alternatively, we can get the confidence interval for our estimate of \(\beta_1\). This is how you do this:
## 2.5 % 97.5 %
## (Intercept) 217.8 251.2
## ConditionNegative -15.7 30.8
What this tells is that we think that 95% of the time if we repeated the experiment again and again, we would get an estimate of \(\beta_0\) between 218 and 251, and estimate of \(\beta_1\) between -16 and plus 30. So, the bad news is that the effect of negative mood on SSRT could, in light of our data, be negative, be zero, or be positive, since all of these possible scenarios are contained within the 95% confidence interval of the parameter estimate.
3.3.4 A General Linear Model with a continuous predictor
The model m1
had one predictor, experimental condition, which was binary (i.e. Negative versus Neutral). How do we fit a General Linear Model when our predictor is a continuous variable?
Everything is pretty much the same. Let’s consider the case of whether SSRT is predicted by the variable Age
. We assume that there is some value of SSRT that people have all their lives, and represent this by the parameter \(\beta_0\). Then we assume that their SSRT changes by an amount \(\beta_1\) with each additional year of age that passes (note, therefore, that we are assuming a linear relationship between age and SSRT at least across the range that we are studying). So, the expected value of a person’s SSRT under this model is: \[E(SSRT) = \beta_0 + \beta_1 * Age\]
If \(\beta_1\) is a positive number, older people have higher SSRTs; if it is a negative number, older people have lower SSRTs; and if \(\beta_1 = 0\), then SSRT does not change with age. Note the slight difference in interpretation: for model m1
with a binary predictor, \(\beta_1\) represents the difference in expected SSRT when you go from Neutral
to Negative
; in this model where the predictor is continuous, \(\beta_1\) represents the difference in expected SSRT when Age
increases by one unit (i.e., one year).
We fit this model using the lm()
function in exactly the same way as before:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 208.145 13.750 15.14 2.90e-21
## Age 0.933 0.383 2.44 1.81e-02
The estimate of \(\beta_0\) is about 208, and the estimate of \(\beta_1\) is about 1, suggesting that SSRT goes up by about 1 msec with every year older a person is. Though m2
is fine, it is not perhaps expressed in the most easily interpretable way. \(\beta_0\) here represents the expected SSRT of someone with an age of 0. The age of 0 years is way outside the range of data (the youngest participant is 19), so extrapolating the SSRT of someone with an age of 0 isn’t statistically very defensible. More importantly, a one-day old baby could not possible do the task anyway, so this parameter does not make sense. We would better off setting our zero point for the Age
variable somewhere else, such as at the average age of a member of the sample.
We do this by centring the Age
variable. This means putting the zero value in the middle of the distribution, and expressing the other values as negative or positive deviations from the middle. When you centre a variable, you also have the option of scaling it. Scaling means setting the standard deviation to 1 at the same time as setting the mean to zero. (Standardizing a variable means both scaling and centring it.)
In modelling data, I recommend always centring your continuous predictor variables, for a number of reasons, including avoiding parameter estimates that make no intuitive sense. This centring will become even more useful later when there are multiple predictor variables, and especially when there are interactions between them. Whether you should scale or not depends. Where the predictor has easily interpretable units, as with age that has the units of years, I would probably keep it un-scaled, so the interpretation is intuitive. If you scale it, the parameter estimate for Age
comes to represent the expected change in SSRT when Age
changes by one standard devation. The standard deviation of Age
in this dataset is about 14.8 years, so the parameter estimate would represent ‘the amount SSRT changes when age increases by a bit less than 15 years’. The amount by which SSRT changes with every year older is simpler to explain.
So, let’s centre Age
but not scale it. We do this by subtracting the mean of the variable, as shown below. You could also use the R function scale()
, which can scale a variable, centre it, or both.
Now rerun m2
and get the summary:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 238.716 5.620 42.48 9.77e-44
## Age_centred 0.933 0.383 2.44 1.81e-02
The Intercept
, which represents the expected SSRT of a person of average age, is now about 238. This is identical to average SSRT in the sample (as you can verify using mean(d$SSRT)
if you wish). And, as before, \(\beta_1\) represents the change in SSRT when age increases by a year (about 1 msec). Let’s get the confidence intervals:
## 2.5 % 97.5 %
## (Intercept) 227.453 250.0
## Age_centred 0.165 1.7
The confidence interval for \(\beta_1\) ranges from about 0.2 to about 1.7. If we were to run the study 100 times, we would almost always get estimates in this range. Thus, though we are not sure exactly what \(\beta_1\), we are pretty sure it is a positive number. If we ran many studies, in almost all of them, SSRT would be higher on average in the older participants. The fact that zero is not in the confidence interval gives us grounds to believe that, in the big world of all humans \(\beta_1 > 0\). It also gives us grounds to believe that \(\beta_1\) is not very big: because 10 is way outside the confidence interval, the data are incompatible with belief in the hypothesis that SSRT increases by 10 msecs on average with every year of age. We return to how to test hypotheses using the output of your General Linear Model in chapter 4.