1.6 Types of experiment

There are two major types of manipulated variable in psychology and behavioural science: variables that are manipulated between subjects and those that are manipulated within subjects. An experimental study all of whose IVs are manipulated between subjects is called, unsurprisingly, a between-subjects experiment. An experimental study all of whose IVs are manipulated within subjects is called a within-subjects experiment. An experimental study with a mixture of between and within subjects manipulated IVs is called a mixed-design experiment.

1.6.1 Between-subjects manipulation

In cases of between-subjects manipulation, a different group of participants experiences each treatment. Imagine we are studying whether drinking a sugar drink increases running speed. We form two groups, at random. Participants in one group drink a sugar drink. Participants in the other group drink a sugar-free drink that looks and tastes the same. All of the participants do a timed 100m run. We measure their times. The critical comparison, for data analysis purposes, is going to be between the people in the one group and the people in the other. Hence, ‘between subjects’.

1.6.2 Within-subjects manipulation

In cases of within-subjects manipulation, each participant experiences several different levels of the same IV. Instead of forming two groups, all the participants do the 100m run twice, once after drinking a sugar drink, and once after drinking a sugar-free drink. In this within-subjects design, the critical comparison from a data analysis point of view is of a participant to themselves when they are in the other treatment. That is, we are comparing Daniel when he has drunk sugar to Daniel when he has drunk a sugar-free drink. Hence, within-subjects comparison.

1.6.3 Within or between ?

Within-subjects manipulation is generally superior. The reasons should be obvious. What we care about is the ACE of sugar on the running performance. What better way could there possibly be than comparing the very same person in the case when they have drunk sugar to the case where they have not? Within-subjects manipulation is particularly advantageous when there are large differences between people in their baseline performance. In the general population, some people can run a lot faster than others, sugar or no (those differences could be of the order of 50% or even 100%). When you compare between different participants, those big differences are going to swamp the probably subtle effect of the sugar (which might be an increase in performance of 3%). But when you compare within individuals, those differences are accounted for by design.

Nonetheless, you do need to be careful with within-subjects manipulation. First, there can be important order effects: participants can get better or worse at the task with practice, and this means that the second time they do it will be different from the first for reasons other than the ACE you wish to identify. Thus, it is important to counterbalance the order of treatments (that is, half the participants do sugar first, then sugar-free, and the other half the other way around).

Second, if you do the treatments too close in time to one another, there can be a spillover. If you make people do a second 100m run one minute after the first, they might still be tired. If you put people on a healthy diet for 12 weeks, its benefits might linger for a long time after the 12 weeks are over. For these reasons, if you are manipulating a variable within subjects, you should choose an appropriate washout period (a time gap between treatments long enough for the value of the DV the second time to be unaffected by the first treatment).

If within-subjects manipulation is superior, why do we ever use between-subjects manipulation? There are many good reasons. A vaccine might provide lifelong protection. In this case, you cannot very well come back and give the person a placebo shot a year later. I was involved in research where we manipulated birds’ nutritional environment in the first few days after hatching (Nettle et al., 2017). You can’t rerun time and give an individual their first few days of life all over again. Sometimes it is implausible or odd to measure the DV again. If you want to measure an individual’s reaction the very first time they see a tiger, then by definition you cannot do this again with the same individual.

Within-subjects manipulation can also draw your participants’ attention to what you are looking for, in ways that could be problematic. Let’s say people have to do a difficult and unpleasant task, and then rate their stress level. One of the times they do it (but not the other), they see a face of a friend beaming at them from the screen. They can quickly work out that the experimenter is looking for them to be less stressed when the friend’s face is there. They might give lower stress ratings, in order not to disappoint what they figure the experimenter’s expectations are. This is called an experimenter demand effect. (Volunteer participants are generally a very helpful bunch who - for the most part - would be only too pleased to help you confirm your hypothesis if they knew what it was.) I have been asked to repeat experiments with between-subjects manipulation to rule out experimenter demand effects as the explanation for my finding. If you use between-subjects manipulation, each participant only sees one treatment, and so has less chance to see what the critical difference between treatments is, and hence figure out the hypothesis.

You may, then, end up having to manipulate your IV between subjects rather than within subjects. This is fine: the ACE is still identifiable between subjects. The reasons it is identifiable are different from the within-subjects case, though. With within-subjects manipulation, the ACE is identifiable because the only difference between the two groups is which treatment they are getting (other than order, and you have counterbalanced for that). It’s the same people doing the same thing, but now they have had sugar instead of sweetener.

With between-subjects manipulation, many things are different between your two groups. They are different people. One group might contain someone who was a sprint champion at college. The ACE of interest is still identifiable, because those differences are not systematic. The chance of getting a former sprint champion in one group is the same as the chance of getting one in the other, because of random assignment. So if you make your groups big enough, then all the random differences in who ended up in which group are going to average out. The way mathematicians describe this is that your groups do not differ in expectation in any way other than the ACE of interest. You thus need to make your groups big enough to even out all the chance differences between people.

For between-subjects manipulation, then, you need much larger groups of participants, to average out all the variation between people. How much larger depends on how much consistent variation between individuals there is in your DV, but the difference required can be an order of magnitude or more. You need to plan for this. There is no point in doing a between-subjects experiment with twenty participants, but using within-subjects manipulation, this could be a reasonable sample size under some circumstances.

You can sometimes help yourself with the required sample size, if you do between-subjects manipulation, by matching the two groups. Matching on age and gender, for example, means that for every 32-year old woman assigned to the sugar group, you assign a 32-year old woman to the sugar-free group. By matching, you ensure that your groups are equivalent for age and gender (which could be related to running performance) not just in expectation, but in actuality. With well-matched groups, you may be able to see treatment effects with smaller sample sizes than with groups formed entirely at random. Be careful, however: the moment you match, you are allowing something other than chance to dictate group membership, and this could produce bias if you don’t do it carefully. For example, you could do a screening study where all the people who might take part in your trial come forward and fill in a demographic questionnaire. Then, of the people who have come forward, you randomly assign one to the sugar group. You then search the sample for another as close as possible to that one in age and gender, assign them to the sugar-free group, and repeat the procedure. This is only semi-random, but at least everyone comes from the same pool, was recruited in the same way, and could have ended up in any group.

1.6.4 Study design

A study design is a succint statement of how a study is supposed to work. You present the design statement whenever you write about your study, for example in the Methods section of a paper. The components of the design statement would typically include:

Whether it is observational or experimental;
If observational, whether it is cross-sectional or longitudinal, and what are the main predictors and the main outcome variables.
If experimental, what the IVs are, whether each one is manipulated within or between subjects, and how many levels each one can take. For example, a study with two IVs each of which has two levels is known as a 2 x 2 design.
If experimental, what the DV is (or DVs are, if there is more than one).
If experimental, whether all possible combinations of the IVs are presented. For example, with two IVs each of which has two levels, there are potentially a total of four experimental conditions. The table below shows an example, with an imaginary experiment where people have to do a task with their left or right hand (IV1) whilst in the light or in the dark (IV2). If all four possible conditions are presented, the design is called full factorial: every treatment of every IV is presented in combination with every treatment of every other IV. If there are only two IVs with two levels, designs are almost always full factorial, but once you get to larger numbers of IVs, or IVs with many more levels, there might be reasons for only presenting a subset of the possible conditions.

	In the light	In the dark
Left hand	Condition 1	Condition 2
Right hand	Condition 3	Condition 4

Note that if there is only one IV, then every level of that IV is a condition. The distinction between conditions and treatments collapses if there is only one IV.

References

Nettle, D., Andrews, C., Reichert, S., Bedford, T., Kolenda, C., Parker, C., Martin-Ruiz, C., Monaghan, P., & Bateson, M. (2017). Early-life adversity accelerates cellular ageing and affects adult inflammation: Experimental evidence from the European starling. Scientific Reports, 7(1), 40794. https://doi.org/10.1038/srep40794