5.4 Figure 2: A scatterplot
5.4.1 A first scatter plot
In figure 1, the x-axis was categorical (the two conditions) and the y-axis was continuous. What if we want both axes to represent one continuous variable against another (a scatter plot)? Let’s make figure2
, mapping the covariate puritanism_score
to the x-axis and coop_score
to the y-axis. I have done a few things like set the axis labels and the theme already.
figure2 <- df7 %>%
ggplot(aes(x=puritanism_score, y=coop_score)) +
geom_point() +
theme_classic() +
xlab("Puritanism score") +
ylab("Change in cooperativeness")
figure2
That looks ok. But maybe we want to add a fit line (called a smoother) to show the relationship between the two variables. There are various options for this, including non-linear ones, but we will go for a straight line generated by a General Linear Model.
figure2 <- df7 %>%
ggplot(aes(x=puritanism_score, y=coop_score)) +
geom_point() +
geom_smooth(method="lm") +
theme_classic() +
xlab("Puritanism score") +
ylab("Change in cooperativeness")
figure2
You should see that the line is pretty flat; there is not a strong relationship between the variables.
5.4.2 Mapping a variable to the colour of the points
Figure 2 does not so far show that the study had two conditions. Half of the participants were rating coop_score
in the indulgence
condition, and half in the restraint
condition. We know from figure 1 that the scores were pretty different in the two conditions, and so maybe we should indicate on the graph which condition each point comes from. To make the point colour reflect the condition, we need to expand the aesthetic: as well as mapping axes to variables (x to puritanism_score
, y to coop_change
) we want to make the colour of the points to the variable Condition
. Let’s do that:
figure2 <- df7 %>%
ggplot(aes(x=puritanism_score, y=coop_score, colour=Condition)) +
geom_point() +
geom_smooth(method="lm") +
theme_classic() +
xlab("Puritanism score") +
ylab("Change in cooperativeness")
figure2
This is good. However, not only do we now have the points coloured according to condition, we have fit lines coloured and fitted by condition too. The lines seem to have different slopes, an issue which we will return to in section 5.5. But maybe this figure does not reflect what we wanted. We might have wanted the overall fit line (i.e. just one line) whilst having colour coding by
Condition
for the points only. To achieve this, it is important to appreciate that there is an overall aesthetic defined in the first line of the plot (the one with the ggplot()
call), and this is inherited by all the geom
s. But individual geom
s can have their own geom-specific aesthetics too, in which case the overall aesthetic is overridden or modified. So we can apply the colour mapping just in the geom_point()
, not overall. Try:
figure2 <- df7 %>%
ggplot(aes(x=puritanism_score, y=coop_score)) +
geom_point(aes(colour=Condition)) +
geom_smooth(method="lm") +
theme_classic() +
xlab("Puritanism score") +
ylab("Change in cooperativeness")
figure2
Here, the mapping of colour
to Condition
happens only for the geom_point()
; it is not inherited by the geom_smooth()
. Final change for this section: you may not like R’s default choice of colours when a discrete variable is mapped to colour. Try setting the colour scheme for yourself with scale_colour_manual()
, as below (I won’t show the output here).
5.4.3 Facetting
This section presents you with an alternative to plotting the two conditions in different colours. This is facetting, a brilliant feature of plotting with ggplot()
that makes it possible to make smart looking multi-panel figures, or make it possible to see what is going on in different subsets of your data, with just one line of code. Basically, we make the simple plot we want to see for each level of Condition
, and then we add facet_wrap(~Condition)
. Try this:
figure2 <- df7 %>%
ggplot(aes(x=puritanism_score, y=coop_score)) +
geom_point() +
geom_smooth(method="lm", , colour="black") +
theme_classic() +
xlab("Puritanism score") +
ylab("Change in cooperativeness") +
facet_wrap(~Condition)
figure2
I prefer this version to separating the two conditions by colour on the same facet. You see everything: that the points are higher on the y-axis in
restraint
, and that the slopes are different. You can have a figure with more than two facets (when the facetting variable has more than two levels), or even a grid of facets formed by the combinations of two discrete variables, using facet_grid(variableA ~ variableB)
. You should play around with facetting, as it is very useful.
Now, you should output figure 2 to .png and .pdf files, as we did for figure 1. Think about the best dimensions (for example the facetted version obviously needs to be at least twice as wide as high).