12.4 Trimming your analysis strategy

Often, first drafts of analysis strategies are too messy and too big. What you need to do is trim them down at this point to make them clearer, leaner and more convincing. Trimming the analysis strategy involves pruning the number of models you fit. In my experience, this often means going back to the measures you intend to collect and removing some. Sometimes it even involves deleting some of your research questions, better to focus on the main ones. This is why it is so crucial to think in detail about data analysis before the design is finalized and the data collected.

12.4.1 Too many questions

Try to proceed one question at a time. For example, why think about effect modifiers or mediating variables for your effect before you have really shown that your effect reliably exists? Modern papers in psychology and behavioural science usually include multiple studies in series. Save follow up questions for studies two, three and four. Including them prematurely will just lead to a mass of confusing analyses in which it is hard to see the forest for the trees. The whole will be rhetorically and epistemically unconvincing.

If your objectives are confirmatory, try not to test for too many effects simultaneously. In an experiment, having two IVs in a factorial design is fine. Indeed, one of the great advantages of the factorial experimental design is that you get to test for two causal effects simultaneously in the same sample. If there are no interactions, your statistical power to detect the effect of each IV is as pretty much as good as if it were the sole one. However, once you get to three of more IVs, then you start to need to model many interactions, including three-way or higher. This rapidly becomes hard to interpret, and you might do better to simplify your objectives for the experiment.

12.4.2 Too many versions of the DV

Often there are multiple possible ways of measuring the DV, such as different self-report scales, or explicit and implicit measures. You will be tempted to gather or use all of them and ‘see which one works best’. This is suboptimal: it gives you lots of degrees of freedom and potential for false positives, when one version of the DV seems to show something and others don’t.

Rather than including all the possible outcome measures in the study, do a validation study to decide on the best one beforehand. In a validation study, you assess the measurement properties of all the candidate measures. This can involve investigating how they relate to one another - maybe they are so highly correlated that in the main study it does not matter which one you choose. More often, validation involves relating the measures to some unambiguous ‘gold standard’ criterion in a separate sample. For example, putative measures of physical activity ought to distinguish professional endurance athletes from the general population. Those that don’t are not very good measures.

If you end up with multiple versions of your DV or outcome in the dataset, and they are very highly correlated, you may be able to combine them into one index variable (a variable that is a composite of several contributor variables). If two measures of the same construct are correlated at more than 0.7, you could just average them. Standardise them first, so the average is not disproportionately influenced by either one. If there are more than two, you could consider principal components analysis (PCA) to reduce their number. PCA creates a smaller number of synthetic variables that capture the shared variance across a set of variables. PCA is only appropriate where there is enough shared variance across them, which you establish with the related KMO test. I will not give details of how to perform PCA or the KMO test, but it is straightforward in R.

12.4.3 Too many models

If you do end up with multiple versions of the outcome, or DVs, it’s easy to end up with many models each testing basically the same question, which inflates the false positive rate. You need to combine the models into one meta-model or multi-variable model, so that all tests of the same research question are done in the same model.

Let us say for example that you have an experiment with one between-subjects IV, and two versions of the DV, DV1 and DV2. You are tempted to fit two models: DV1 ~ IV and DV2 ~ IV. But this gives you two bites at what is actually one cherry. You can reduce these two to a single model, by considering that what you have is two observations from each participant (one for DV1 one for DV2) of a single variable, score. The model to fit is therefore a Linear Mixed Model: score ~ IV + variable + IV:variable + (1|participant). In this model, variable has two levels, {DV1, DV2}; the main effect of IV captures whether the IV did anything to to the score on average across the DVs; and the interaction term captures whether the IV did something different to DV2 than what it did to DV1. To fit this model, you will need to reorganize your data into long format appropriately, using pivot_longer(). I would standardise DV1 and DV2 prior to analysis so that their means and standard deviations are the same. This means that the main effect of variable will be zero.

To take another example, let’s say you have measured mood 10 minutes after your intervention, 20 minutes after, and 30 minutes after. Don’t be tempted to fit three models, one for the 10-minute data, on for the 20-minute, and one for the 30-minute, each with your IV as the predictor. Rather, you have a single Linear Mixed Model, with the predictors IV, Time and the interaction between IV and Time, where Time is a categorical variable with levels {10, 20, 30}; plus of course a random effect for participant.

The approach known as multivariate analysis of variance (MANOVA) (R function manova()) is another way of implementing this kind of multivariable strategy. It requires a different, wider data format, though, with your different versions of the DV in separate columns.

If you have been unable to reduce your number of versions of the outcome or DV to one, then you will have multiple models that can be thought of as testing substantively the same question. In such a case, you need to consider correction for multiple testing. This is where you adjust your p-value criterion for significance to take into account that you have had multiple bites at the cherry.