5.6 When should you fit interactions?

Interaction models allow for some predictor variables to modify the effects of others. They are in a sense more general than the corresponding additive models. Indeed, the corresponding additive models are just the special case of the interaction models where all the interaction terms are zero. Why not, therefore, always specify your models to include all possible interaction terms? The problem with this is that interaction models quickly become very complex. The number of possible interaction terms rises exponentially with the number of predictor variables. Other things being equal, simpler models are preferable to more complex ones. The results can also become hard to interpret or to relate to your question if you have many interaction terms.

As a rule of thumb, for observational studies, especially those with many potential predictors, only fit interaction terms where you have some reason to predict them, either a theoretical prediction, or an empirical hunch that they might be important. Otherwise just specify the additive terms as a starting point.

For experimental studies, there is a tradition of fitting all the interaction terms involving your IVs (i.e. your manipulated variables). This makes a lot of sense. If your two IVs are whether the participants took caffeine or not, and whether they did the task in the light or the dark, then it makes sense that you would want to ask whether the caffeine had the same effect on the DV in the light and in the dark, as well as asking whether caffeine had an effect overall. You would do this by fitting a model with the two main effects and the interaction term. For statistical tests in such a case, I would recommend using ANOVA (see chapter 7).

As for covariates within experimental studies (non-manipulated variables that might nonetheless associate with the DV), you don’t usually fit the interactions between them and the IVs, or between them if there are several, unless you have some reason to expect them (as we did in the Fitouchi case). This is, however, only a rule of thumb and might not always be the best approach. The most important thing is to make sure that the specification of your model relates to your research question and predictions. If the question of interest involves one variable modifying the effect of another, then obviously you need to include the relevant interaction term in the specification of your model. Your model can only ‘see’ the things in the data that its specification allows for.

5.6.1 Use interaction models to test for differences in association across levels of another variable

A very important thing to stress is that if you want to make a claim that the effect of one variable is different according to the level of another variable, you must test this by a statistical test arising from an interaction model. People very often fail to do this right. Instead they often do the following: they divide their data into subsets based on some grouping variable like gender. They find that, say, their experimental effect is ‘significant’ (p < 0.05) in the women subset and not significant in the men subset. They make some conclusion like ‘this treatment has an effect in women but not in men’.

However, they have not shown that the effect of the treatment is any different in women and in men. If you split your dataset in half, your statistical power to detect an effect will be lower in either of the halves than in the whole dataset. So, if you make a load of subsets, even by random sampling, the effect is going to be ‘significant’ (p < 0.05) in some of the subsets and not others, just by chance. This is true even if the actual underlying effect is consistent across all cases. It may well be the case that your parameter estimate is similar in the women and men subsets (at least, with overlapping confidence intervals), but, the p-value happens to be one side of the significance line in one case and the other side in the other. So finding difference of significance across genders does not mean that you have found significance of difference.

If you want to test the hypothesis that the effect of your treatment is different in women and men, the only sound way of doing this is testing whether the interaction term is significantly different from zero in a model which contains treatment, gender, and the interaction of treatment and gender. Very often you will find that the interaction term is not significantly different from zero (therefore, no evidence of a difference in effect by gender), even if the effect of treatment is significant in one subset of the data and not in the other. There is, as it were, a difference of significance without a significant difference.