7.4 ANOVA for a General Linear Model
The case we have examined this far was an ANOVA table based on a Linear Mixed Model. Often, you will want an ANOVA table based on a General Linear Model instead. R has a base function for an ANOVA table, aov()
. Perversely, it will only give you F-ratios based on type 1 calculations. You therefore need to use the contributed package car
. This package contains a function Anova()
that will give you all three types. (Note, Anova()
with a capital, to distinguish it from a base R function anova()
.) The rest of this section shows you how to get an ANOVA table for a General Linear Model in this way. Go to the repository at https://osf.io/xrqae/ and get the data file ‘study7.data.csv’. You know the deal by now; save the file into your working directory, and then:
Now, run install.packages('car')
, and:
The DV in study 7 was once again that percentage of the harvest that the participant thought should be shared out between the villages (in the data, variable redistlevel
). The IV (Condition
) was again the importance of luck in the production of food, with three levels: High
(participant is told luck is important); Low
(participant is told luck is not very important); and Unspecified
(participant is not told anything about the role of luck). There was only one IV in this experiment, but we are going to include a non-manipulated continuous covariate in our model too. (ANOVA can also handle continuous predictor variables too, even though it is more often used for experimental IVs that are categorical). This is the participant’s political orientation on the left to right axis (variable leftright
). The predictions of the study were that it would make a difference to how much the participant thought should be shared out: how important luck was; whether they identified as left or right wing; and (maybe) some interaction between these two predictors.
Let’s make Condition
a factor with Low
the first (and hence reference) level; centre leftright
; and fit the model.
d7$Condition <- factor(d7$Condition, levels=c("Unspecified", "Low", "High"))
d7$leftright.c <- d7$leftright - mean(d7$leftright, na.rm=TRUE)
s1 <- lm(redistlevel ~ Condition*leftright.c, data=d7)
summary(s1)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.3936 0.9514 41.407 3.10e-263
## ConditionLow 0.3010 1.3449 0.224 8.23e-01
## ConditionHigh 4.0201 1.3427 2.994 2.79e-03
## leftright.c -0.0970 0.0411 -2.363 1.82e-02
## ConditionLow:leftright.c 0.0269 0.0580 0.464 6.43e-01
## ConditionHigh:leftright.c -0.0172 0.0578 -0.297 7.67e-01
Now let’s get the ANOVA table, type 2.
## Anova Table (Type II tests)
##
## Response: redistlevel
## Sum Sq Df F value Pr(>F)
## Condition 5997 2 5.57 0.0039 **
## leftright.c 8499 1 15.78 7.4e-05 ***
## Condition:leftright.c 318 2 0.30 0.7443
## Residuals 961317 1785
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
So, we have a significant main effect of Condition
, a significant main effect of leftright
, and no significant interaction. You would need to follow this up of course by working out which levels of Condition
differed from which other (it looks like High
is different from Low
, but neither is different from Unspecified
), and which direction the association between leftright
and redistlevel
goes (there are no surprises there, people who identify as more right-wing think less of the harvest should be redistributed).
Model s1
is a case where the conclusion you would draw if you based inference on individual parameter estimates would depend a lot on your choice of reference category for Condition
, and also whether or not you centre leftright
. You can verify this for yourself by switching reference categories for Condition
with the relevel
function, and using the centred or un-centred version of leftright
.