2.4 The Behavioural Inhibition dataset
For the rest of this chapter and the next, we are going to use the data from a published paper to do our first data analysis (Paál et al., 2015). The paper reports a study on behavioural inhibition. Behavioural inhibition is the capacity to inhibit a response that you might otherwise make automatically. For example, let us say that you are the in the habit of checking social media every time you pick up your phone. You exhibit behavioural inhibition if you pick up your phone and don’t check social media, for example because you have decided to have a weekend free of social media. Your behavioural inhibition has failed if you find yourself checking anyway.
In the study, people’s capacity for behavioural intuition was measured using a task called the stop signal reaction time (SSRT) task. In this task, participants sit at a computer and have to respond as quickly as they can (with the appropriate key) when a square or circle appears on their screen. On a minority of trials, an audible tone (the stop signal) is played just before the square or circle appears. When the tone is played, the participant has to not respond at all (to display behavioural inhibition).
In the task, participants complete over two hundred trials. When the stop signal is played, the time gap between it and the appearance of the square or circle is varied. Sometimes the participant succeeds in inhibiting the response, and sometimes they fail, because they were too far down the road to pressing a key. From this, it is possible to estimate how much time the participant needs on average in order to successfully inhibit their response. This variable is called the SSRT. It is measured in milliseconds. A person with a large SSRT is not so good at behavioural inhibition. They can only manage to stop the response if they have a lot of forewarning. A person with a small SSRT is good at behavioural inhibition. They can shut down the response even if the instruction to do so comes very late.
You can find the paper open access online (https://peerj.com/articles/964/) to get more information on what the study is about. This is an experimental study with 58 participants, where the DV is the SSRT, and the IV is mood: the researchers tried to put people into either a neutral mood, or a negative one, by having them read either a sequence of negative statements, or a sequence of negative ones, prior to completing the SSRT task. The manipulation of mood was between subjects. The main estimand of the study is the ACE of negative mood on SSRT, with the hypothesis that SSRTs will be higher in the negative condition (that is, that bad mood will make behavioural inhibition worse). An additional hypothesis of the study is a non-experimental one: that people who grew up in more deprived socioeconomic conditions will have higher SSRTs. Childhood socioeconomic conditions were measured using the postcodes of the participants’ childhood addresses.
In the rest of the chapter, we are going to
find the data and load it in;
sense check the data;
calculating descriptive statistics;
and doi some quick plots to help understand what is going on;
2.4.1 Getting the data
The data are published online with the paper. In the online version (https://peerj.com/articles/964/) scroll down to Supplemental Information, which is after the Discussion but before the References, and you will see a link to the data file. Most papers these days publish their raw data, often in a separate repository with a link from the online version of the paper. In this case, you just access it directly from the online version of the paper. In the Supplemental Information, click on the link to the data. You should see 58 rows of data, preceded by a row of column names. The precise URL of this data file is: https://dfzljdn9uc3pi.cloudfront.net/2015/964/1/data_supplement.csv. Because this is so long, I have set up a shortcut URL: https://bit.ly/inhibitiondata.
Open a new script in RStudio. Start it with a comment so that you know what script it is, and load in tidyverse()
:
The next thing to do is save this script (File > Save
or the little disk icon). Set your working directory appropriately so that you can find it again.
The next line you need in the script is the one that reads in the data (you can use copy and paste from your browser to get the long URL):
Run your script. You should receive some confirmation output, and then, if you go to the Environment tab of the window at top right of your screen, you should see that an object called d
should have appeared. This is a crucial type of object called a data frame. (In fact, this one belongs to a specific subclass of data frames called specialised tibbles, but, hey.) A data frame contains a bunch of vectors of the same length, each representing a variable, and each with a name. You can think of your data frame as like a spreadsheet where each column represents a variable and each row a case (here, a participant). To see your d
laid out in this way, try View(d)
.
Note that it was completely arbitrary that I called this data frame d
; if you want you can call it b
or behavioural_inhibition_data
, or whatever you want.
2.4.2 Sense checking the data
The first thing you always want to do when you load in a data frame is check that you are getting what you expected, understand what the variables are, and do any tidying up that is needed prior to the analysis. It is worth spending some time on this step, to make sure you understand exactly what you have, and spot any oddities. You can start by getting a list of the variable names in your data frame using the function colnames()
.
## [1] "...1" "Experimenter"
## [3] "Sex" "Age"
## [5] "Deprivation_Rank" "Deprivation_Score"
## [7] "Mood_induction_condition" "Initial_Mood"
## [9] "Final_Mood" "SSRT"
## [11] "GRT" "Correct_responses_go_trials"
## [13] "Response_Probability"
We can see some things we are looking for, such as SSRT
(the DV), Mood_induction_condition
, which is presumably the IV, Age and so forth. The first column has a bit of a weird name (...1
), so let’s investigate what this is. You can address a column within your data frame using the notation $
, so that the code below means ‘the column called ...1
in the data frame d
’. Try:
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## [51] 51 52 53 54 55 56 57 58
You can see that this first column is just the participant number. You should always have variables that are comprehensibly labelled (for your own benefit as well as the benefit of your readers). So let’s change this to a better column name.
Now run colnames(d)
again, you should see the new name.
Now let’s think about the IV, the mood induction condition (neutral or negative) that the participants were assigned to. Because assignment was at random, about half the participants should have been in the neutral condition and about half in the negative one. So let’s check that this is about right. Let’s make a table of the values to see that we understand.
##
## 1 2
## 30 28
Hmm. We have about half the cases (30) with a value of 1
and the other half (28) with a value of 2
. That makes sense, but it is bad coding by the researchers, because we don’t know which condition is represented by 1 and which by 2. (Don’t do this kind of thing, by the way; use the words ‘Negative’ and ‘Neutral’ to avoid this kind of uncertainty). How are we going to find out which is which?
We know from the paper that the researchers measured people’s mood on a scale at the beginning of the experiment, and again at the end. This is represented in the dataset by the variables Initial_Mood
and Final_Mood
. So, we ought to be able to tell which are the group who had the negative mood induction because their mood will have got worse over the course of the experiment. So, first let us make a new variable which is the change in mood over the course of the experiment. We do this using the mutate
function. (You will need tidyverse
loaded in, remember.)
Let’s unpack this code. Look at the right hand side first. We are modifying the data frame d
(using the mutate()
function) by adding a new variable, called Difference_Mood
which is computed by subtracting the value of Initial_Mood
from Final_Mood
. We are then assigning this new improved data frame to object d
, thus updating the d
we had. Once you run this, you should see that a 14th column has indeed appeared in the data frame with the appropriate name.
I am going to show you a different way of achieving this same result, because it will be useful later. You can chain together different operations on the same object in R (with tidyverse
) by using the symbol %>%
. This is referred to as ‘pipe’, and informally, I think of it as ‘and then….’. So to make our new variable Difference_Mood
, we could do:
Think of the right-hand side of this as something like ‘take d
, ’and then’ mutate it with a new variable as defined. We will see more examples of piping later.
Ok, so we are still trying to find out which experimental condition is 1
and which is 2
. What we know is that the group whose mood has got worse (negative value of Difference_Mood
) is probably the negative one. This means we need summary statistics (for example the mean) of the Difference_Mood
variable.
First, let me show you how to get the descriptive statistics of a variable:
## # A tibble: 1 × 2
## M SD
## <dbl> <dbl>
## 1 -0.3276 10.36
As you see we get a little table. The average change of mood over the course of the experiment was close to zero (mood is measured on a 100-point scale, so -0.328 is almost no change). And there was some variation between participants in this, a standard deviation of about 10. To explain the line of code we used to do this, we took d
, and then we used a function summarise()
, asking in particular for a summary variable M
which would be the mean change in mood, and a variable SD
which would be the standard deviation.
But, this is not we wanted. We wanted the descriptive statistics by experimental condition. To get this, we need an extra function, `group_by()’, in our line of code.
d %>% group_by(Mood_induction_condition) %>%
summarise(M = mean(Difference_Mood), SD = sd(Difference_Mood))
## # A tibble: 2 × 3
## Mood_induction_condition M SD
## <dbl> <dbl> <dbl>
## 1 1 -3.233 5.276
## 2 2 2.786 13.31
Read this as ’take d
, and then group it by Mood_induction_condition
and then summarise it, giving us the mean and standard deviation of Difference_Mood
.
Looking at the output, group 1’s mood went down slightly over the course of the experiment. Group 2’s mood went up a little bit on average, and this group also showed a lot more variability in their change in mood. So, I think it is safe to assume that group 1 were the negative mood induction group, and group 2 were the neutral mood induction group. By the way, this already makes up think that the mood induction manipulation was not very effective: a 3-point reduction in mood on a 100-point scale seems pretty small.
Let’s make a better version of the Mood_induction_condition
variable that actually uses the names of the conditions rather than just numbers:
d <- d %>%
mutate(Condition =
case_when(Mood_induction_condition == 1 ~ "Negative", Mood_induction_condition == 2 ~ "Neutral"))
What this is doing is taking d
and then mutating it, to make a new variable called Condition
, which takes the value ‘Negative” when Mood_induction_condition
is equal to 1, and ’Neutral’ when it is equal to 2. Let’s check that is has worked:
##
## Negative Neutral
## 30 28