12.3 Choosing an analysis strategy
In this book we have met a number of statistical tools, and a number of ways of testing hypotheses using them. This raises the question: how do you choose which tools you are going to use for a particular job? This is the question of choosing your analysis strategy. You need to be thinking about it already as you design your study, and certainly have settled on it by the time of the pre-registration. It is much easier to find guidance on how to use a particular tool given that you have chosen to use it, than guidance on which tool to choose. There are no very hard and fast rules, and people’s preferences vary. This section sets out some rules of thumb that I find useful. I will present these separately for experimental and non-experimental studies.
12.3.1 Experimental studies
In an experimental study, you are interested in the effects of one or more IVs on one or more DVs. The IVs are manipulated variables with random assignment. Usually, when there is more than one IV, they are factorially combined, so every level of DV1 is equally like to appear in combination with every level of DV2.
In an experimental scenario, the basic model just has the DV on the left and the IVs on the right (we will return to what to do if there is more than one DV in section 12.4.2). Generally you should try to avoid putting any other variables into the model, other than any random effects required due to clustering in the data (see 6.4). The simple model with no additional covariates gives you an unbiased estimate of your estimands of interest, which are the ACE of each of the IVs.
If you have multiple IVs, the default is to include the interactions between them. If one of your treatments modifies the effect of the other, that’s something you want to know. If there are no interactions, your model will still recover the main effects.
For continuous DVs, consider presenting null hypothesis significance tests using ANOVA (chapter 7). In the experimental case, testing against a null hypothesis usually makes sense, and ANOVA has the advantages of: providing a single test when the IV has more than two levels; providing an interpretable test of main effects in the presence of interactions; and providing results that are insensitive to the choice of reference levels and zero points. Supplement your null hypothesis significance tests with measures of effect size such as Standardized Mean Differences, or at the very least give the means and standard deviations for individual experimental groups. Consider incorporating equivalence tests (section 4.3), which will be particularly relevant if results are null (and, pre-register your SESOI).
People are often keen to add covariates to the basic model in which the DV is predicted by the IVs. Try to avoid this. Sometimes the variable they want to add is some aspect of the experimental procedure, such as the order of conditions. Here, you should counterbalance in the design, which means you don’t need to also control for it in the model.
At other times people are tempted to add non-manipulated participant variables they feel are relevant to the phenomenon they are studying (e.g. gender, personality, political orientation). Ask yourself why you feel tempted to include these. The random assignment means they can’t confound any systematic effects of your IV, so they are not confounders. The whole point of doing well-designed experiments is that there are no confounding variables. If you are worried that there are big gender differences and that variation in the gender composition of your groups could mask your experimental effect, then balance your groups by gender, so you do not need to include it in the model.
Historically, researchers were often tempted to add a few participant measures such as personality, attitudes, and so on, often completed in a questionnaire at the end of the experiment. They felt that these didn’t add much to the participant burden, and ‘increased the chances of finding something’. Well indeed they did, and that’s the problem. What we are trying to do is reduce the false positive rate, not add to it. So, you need to ask yourself honestly: is my research question really about the modification of some experimental effect by a non-manipulated participant variable? Or, alternatively, am I just throwing extra measures in speculatively? If the latter, cut the measure. If the former, then include the measure, both as a main effect and in interaction with the IV, and pre-register this as a central research question. But, try to keep effect modifiers like this to a minimum. Also, consider doing the study without the potential effect modifier initially, and including the potentially modifying variable in a replication or follow-up.
There is another situation where including a covariate is justified. This is where your DV naturally scales with some other quantity. An example would be body size in the case where your DV is metabolic energy expenditure. Bigger bodies expend more energy. In effect, the real DV here is not overall energy expenditure, but energy expenditure for body size, so you want to account for body size in the model.
In such a case, it’s fine to include the covariate (as long as the covariate cannot be affected by your IV or DV; that is, it is a pre-treatment variable, see section 3.4.2). By the way, you get an unbiased estimate of your ACE with or without the covariate included. The difference is that you may get better precision on your estimate by including it, because the covariate explains so much variation in the DV. Present sensitivity analysis of the effect of including or excluding it.
In cases like this, always include the scaling covariate on the right-hand side of the model. Sometimes people are tempted to account for it on the left-hand side, for example by making the DV be ‘energy expenditure divided by body mass’. This is problematic, because it makes the strong assumption that the scaling relationship between body size and energy expenditure has a slope of one and an intercept of zero. It might not. It might also be non-linear, something you should aim to establish in order to understand how best to account for it.
Consider whether you are going to employ any transformations of the DV, and whether you are going to exclude any cases. You should not exclude cases based on their value. You might however pre-register to exclude cases based on some other criterion like reaction time, or the participant failing comprehension checks. When you employ transformations and exclusions, analyse the sensitivity of the conclusions to these decisions.
12.3.2 Observational studies
For observational studies, the first decision is whether you are going to fix your model a priori, or whether you are going to define a candidate set of models and use model selection to select the best one. It makes sense to fix it if you have a small number of predictor variables of primary interest, and you know what the potentially necessary covariates are. You can then report the parameter estimates and standard errors or confidence intervals for your each of your variables of interest, given that you have controlled for the covariates. Be careful though if several of your predictor variables are highly correlated (more than about 0.7, though this is just a rule of thumb). The estimates for each of the correlated variables after controlling for the other will be unstable and have low precision. Here, you would be better off using model selection to choose which ones to include, choosing one of them, or else combining them into an index (see 12.4.2).
Using model selection makes sense where there is an open question concerning which set of predictors best explains the variation in the outcome (as we saw in chapter 8). Model selection will also sort out for you which of several highly correlated variables (or versions of variables, as when you test a logarithmic model against a linear one, 8.3) is the best predictor.
In an observational study, you are more likely to need to include covariates, exactly because, without experimental manipulation, there are likely to be confounding variables, and hence the risk of omitted variable bias. Be careful, though, to explicitly justify which covariates you include, and why, to avoid bad control. Don’t include the kitchen sink. You should only include genuine confounder candidates. You should not include variables that could lie on the causal pathway between the predictor and the outcome (mediators), or variables that could be affected by the outcome (potential colliders). If you are not doing model selection, you will need to present sensitivity analysis of the conclusions about one variable to the inclusion or exclusion of another.
An issue that can arise in observational studies, especially those of a more exploratory kind, is that the associations you are interested in could be non-linear and you don’t know in advance what form they will take. You can allow for some forms of non-linearity by transforming the predictors (using the logarithm of the predictor to all for a decelerating relationship; including the predictor squared alongside the predictor itself to allow for a U-shaped relationship). But this is not a panacea: what if the relationship has an even more complex shape, or a shape you do cannot anticipate?
There are classes of models, notably Generalized Additive Models (GAMs), that allow you to deal with non-linear relationship shapes in more flexible ways. They are beyond the scope of this book (though see a little taster in section 8.3). Their strength is their lack of strong assumptions about the form relationships must take. Their weakness is that they can be difficult to interpret and compare across datasets, and they can overfit the dataset at hand.
In a more exploratory observational study, what you are really interested in is not maximizing the fit of the model in your current dataset, but finding the model that will predict the same outcome well in other datasets (that is, you are interested in out-of-sample generalization). The model with the lowest AIC should in principle be the best model in this way. But you don’t really know how could until you test it out on a new dataset. If a new dataset is not going to be available, but your current dataset is large, you can cross-validate your model by holding back a random sample of the data (say one fifth or one third), and testing how well the model established on the rest of the data predicts the outcome in this held-out sample. There are techniques based on doing this repeatedly, holding out different random subsets each time.