1.5 Causality and causal identification
The thing you are trying to find out about in a study is called the estimand (Lundberg et al., 2021). The estimand can be distinguished from the estimator, which is the statistic you will calculate to try to get at the estimand; and the estimate, which is the value of the estimator that you actually find in your study. For example, your estimand could be the difference in depression when people do physical activity instead of not doing it (and nothing else changes); the estimator could be the difference in average score on the Hamilton Depression Rating Scale between two groups of people who have been randomly assigned to do different levels of physical activity; and the estimate could be -6.7 (or whatever).
In science, research questions are usually causal questions if you drill down far enough. The question ‘is physical activity effective at reducing depression?’ can be rephrased using the language of causality as ‘does doing physical activity cause the symptoms of depression to reduce?’. A causal question is a question about a counterfactual scenario. Would my depressive symptoms be less in the counterfactual scenario where I was doing physical activity compared to the current case where I am not? Thus, the estimand in a study is usually the change in expected value of some outcome variable when the level of a predictor variable changes (and nothing else does). Such a change is also called an average causal effect (ACE). The ACE of physical activity on depressive symptoms is the expected difference in symptoms when people do activity versus not. It is an average causal effect because some people might improve and some might not: we are interested in whether, for the average person, activity makes a difference.
If the quantity of epistemic interest in your study is an ACE, you have to ask whether that ACE is identifiable in the dataset you plan to gather. Identifiability means that the magnitude and direction of the ACE you care about can be unambiguously determined given the data you are going to have.
The concept of causal identifiability helps us see why we have to be more cautious in interpreting the results of observational studies than experimental ones. Say you do the observational study on physical activity and depressive symptoms. The people who are doing more physical activity could differ in many ways from the people doing less. They could be richer, they could live nearer a gym, they could have more friends encouraging them, they could be more likely to have a partner, and so on. These variables are called confounding variables, since they confound our ability to see the ACE we are interested in, by creating associations between predictor and outcome for other reasons.
The difference in depressive symptoms of people who do more and less physical activity represents a big muddle of the ACE of interest (effect of activity on depression) and the effects of all the confounding variables. You can help get closer to causal identification by ‘controlling’ statistically for the confounding variables. This means adjusting for the confounders (as long as you have measured them) in your data analysis. We will see how to do this later (see section 3.4). Statistical control is not at all failsafe, though: there can be confounding variables that you have not measured, or failed to think of. It is no substitute for actually doing an experiment. This is why your causal inference from observational data must be extremely cautious.
Experimental studies get closer to causal identification than observational ones because they manipulate the IV, causing it to move to either no activity or high activity. This close to the counterfactual of epistemic interest. Because of random assignment, nothing else should be different between the groups, and the DV should move by exactly the ACE. However, even in experiments you need to be very careful not to introduce confounds that undermine causal identification. For example, in your experimental study, you might decide to have a trainer phone up the participants in the physical activity group once a week to ask them what workouts they are doing, and advise them in case of any problems. But now your two groups differ in two ways: one group is doing more physical activity than the other and also gets a call once a week from an encouraging trainer who is interested in them. So the difference in the DV will reflect the ACE of interest, plus the effect of getting the call. The ACE of interest is no longer identifiable, because it is confounded: any group difference in depressive symptoms could be due to the exercise or the call. You need to think of a way of having the call, and having it be just as encouraging and just as long, in the no activity group as well, to protect causal identifiability.
This principle is sometimes known as the law of one variable in experimental design: your treatments need to differ in the quantity of interest, and not in any other ways. The law of one variable is quite demanding. It means you need to design texts and tasks that are exactly as long, and exactly as specific, in all treatments. In medical studies, it sometimes causes the need for sham surgery, where a surgeon opens the patient up, does nothing, and sews them back up again. In one famous study, it required the experimenter to cut the tail feathers off some birds and then stick them straight back on again (Møller, 1988).