7.3 What ANOVA is and how it saves the day

I have set up the advantages of ANOVA. For variables with more than two levels, it gives you a single test statistic for the prediction ‘the means of the outcome variable differ across the levels of the predictor more than you would expect if the null hypothesis were true’. And, in models where variables appear in interaction, it gives you a single test statistic for the prediction ‘this variable affects the outcome on average across all the levels of the other variable(s)’, rather than just ‘when the other variables are at their reference level, or zero’. The results of ANOVA tests are also unaffected by centring or scaling the predictor variables.

Now, let’s get the ANOVA table, and then discuss what the numbers in it mean.

anova(m2, type=2)
## Type II Analysis of Variance Table with Satterthwaite's method
##          Sum Sq Mean Sq NumDF DenDF F value  Pr(>F)    
## luck      29205   14603     2   495   29.75 6.3e-13 ***
## het        2020    2020     1   495    4.12   0.043 *  
## luck:het    187      93     2   495    0.19   0.827    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The key numbers you are looking for are the F-ratios (F value) and their associated p-values. So we see that, in the ANOVA table, we have a significant main effect of luck (the evidence suggests that, on average across the levels of heterogeneity, luck affects the DV); a significant but much weaker main effect of heterogeneity (the evidence suggests that, on average across the levels of luck, heterogeneity affects the DV); and no significant interaction (there is no evidence that the level of luck modifies the effect of heterogeneity). These are actually the tests that are relevant to the authors’ predictions.

What is the F-ratio? I won’t go into too much detail here but the rough idea is as follows. If the null hypothesis is true, the degree by which the means of the groups differ from one another can be used to give us an estimate of the variance of the DV. If the variance in the DV were zero, all the means would be the same. The more variance the DV has, the more those means will differ just by chance even though the null hypothesis is true (i.e. the population means for the different levels of the IV are the same). Another estimate of the variance of the DV is the actual variance, i.e. the mean squared difference between cases.

The F-ratio is the ratio of the first of these estimates of the variance (the one that comes from the differences between the group means) to the second (the one that comes from the individual cases). If that ratio is about 1 or less, then there is nothing to see here: the variance estimated from the group means is no greater than the variance estimated from the individual cases. Under the null hypothesis, everything makes sense. If the ratio is much bigger than 1, something is awry. For the case of luck, the variance estimate that comes from the difference between the group means assuming the null hypothesis to be true is about 30 times larger than the variance estimate coming from all the individual cases. Such a large discrepancy is unlikely to happen by chance. Instead, it is reasonable to assume that the null hypothesis is not true: the means of the groups from which the data are drawn are not the same.

A F-ratio has two associated numbers, its numerator degrees of freedom and its denominator degrees of freedom. You can see them in the table. These tell you how precise the two variance estimates that form the numerator and denominator of the F-ratio are (how many differences they are based on). You will see that in this case, for example, the numerator degrees of freedom for luck is two because there are three levels of luck, and the numerator degrees of freedom is the number of levels minus one. The denominator degrees of freedom is the number of observations minus a number that reflects how many parameters you are estimating in the model. Thus, it is usually a little fewer than the number of observations. Here, it is quite a lot fewer (495 for 600 observations) because, it being a mixed model, we are estimating random intercepts for each participant. Note that when the underlying model is a Linear Mixed Model, you can end up with non-integer degrees of freedom.

The degrees of freedom must always be reported alongside an F-ratio, as in: ‘There was a significant effect of luck (F(2, 495) = 29.75, p < 0.001)’. Note by the way that a significant ANOVA test tells us nothing about what direction the differences went in (or which groups were different from which others), only that there were some differences in the mean of the DV according to the level of luck. The sentence I have just given would need to be followed up, in any write up, with further information on the means of the DV observed at each of the levels of luck, or the individual coefficients from the model.

One more thing: there are three ways of calculating the variance estimates that go into an F-ratio, known as type 1, type 2, and type 3. The differences are to do with whether (and in what order) other variables in the model are taken into account when calculating the variance estimates for a given variable. Under many circumstances, such as here, it makes no difference. It makes no difference when there is a single predictor variable, or when the predictor variables are perfectly uncorrelated, which is often the case in experimental designs where you assign participants to all possible combinations of the different IVs. But, in other circumstances, the different types can produce different answers. Traditional statistic packages often used type 3, but more recently guidance is to use type 2 (Langsrud, 2003). If you use type 3, you will also need to recode your qualitative variables using contrast coding, something that easy to do, but I will not go into here. So, we will use type 2.

References

Langsrud, Ø. (2003). ANOVA for unbalanced data: Use Type II instead of Type III sums of squares. Statistics and Computing, 13(2), 163–167. https://doi.org/10.1023/A:1023260610025