1.1 Introduction

This first chapter sets out the fundamental conceptual building blocks that we need to have clear in our heads to take on data analysis: variables, experiments, designs, theories and hypotheses. Every subsequent chapter draws on the concepts introduced and discussed here. Even if many of the terms are familiar to you, I recommend you read it through.

Empirical science is, in some regards, simple. You start with research questions (do objects of different masses fall as the same speed? Is physical activity effective at reducing depression?), and you use data to move towards answers to those questions (yes, but only in a vacuum; yes it seems to be so). Another way of looking at this is that we start out in ignorance (there are things we don’t know) and move towards knowledge (now we know them).

I say we move towards knowledge rather than we move to knowledge, because empirical knowledge is always provisional, and corrigible by further evidence. If, by knowledge, you mean beliefs that are so secure they are beyond doubt, then knowledge is the asymptotic state: we move closer and closer, but we never actually quite get there. Any single study constitutes only a small part of the body of evidence, which is distributed across many studies. As these accumulate and respond to one another, the evidence gets better and better, and so we collectively reduce our uncertainty more and more. But our uncertainty should never be zero, or science has hardened into dogma.

The way we move from questions towards answers in empirical science is through data analysis. Data are measurements of things that happen in the world: perhaps out there in the field, perhaps within the confines of your laboratory. You might cause the data to exist, by asking people to take part in an experimental study, or they might represent events and processes that were happening anyway, as when you use measurements of sea temperatures or a bank’s financial transactions. Either way, the data lie on your road from questions towards answers, and data analysis is the what you do with those data to move along the road.

Data analysis was traditionally treated as a kind of boring add-on, something only thought about after the fun part of designing and executing the study was done, a necessary ritual chore to be gone through to please peer reviewers. Perhaps it was something to be farmed out to a professional statistician. Or perhaps you could remember enough from your tedious statistics course to do just enough data analysis yourself to get away with it.

This book, instead, puts data analysis centre stage. Data analysis is the central activity of empirical science: it is the very process of using your data to link your questions to tentative answers. It involves skill and judgement, sometimes even flair. It is not unduly difficult, given the tools we have freely available, as long as you are prepared to think carefully about it. It is highly enjoyable. You need to be able to do it yourself, even if what excites you in research is interviewing participants or standing in a lagoon watching fish.

Your thinking about data analysis cannot begin once the data are all gathered. Your approach to analysing your data is inseparable from what it is you want to find out and how you plan to do so. You don’t want to waste your time and effort gathering a load of data that cannot possibly answer your question, or cannot discriminate between your actual hypothesis being correct, and a slightly different hypothesis being correct. Coming up with a good study involves working backwards from the data analysis you need to do to answer your question. What would I need to measure to test the question I want to test? What pattern in the data would I see if one hypothesis is supported? What pattern would I see if the other is? What criterion would I apply to decide what the conclusion is? Study design is just bringing about the data set that allows you to do the data analysis you need to do to answer your question. Writing empirical scientific papers is just putting your data analysis in a communication wrapper. So, data analysis is a central activity. You need to learn to love it and become skilled at it.