Preface

This book consolidates the data analysis teaching I have given over the last decade; the training I have tried to give graduate students who have worked with me; and the lessons I have drawn from having done the data analysis for over a hundred studies, and having tried to do it better as I learned more.

It is almost a nutritionally complete statistics course, though it is very light on the underlying mathematics, and it goes rather fast. (It also does not cover Bayesian analysis at all; that would be an obvious next step after this book.) If you had no other training in statistics, you might manage to pick up the essential competences and even some advanced ones, especially if you were curious and willing to supplement what I say here with some searching of the web. I imagine the typical user will be someone who has already had some statistics training. They might have sat through an undergraduate statistics course (perhaps not retaining much other than that the Normal distribution is somehow important, there are lists of different statistical tests, and it’s good when you get a p-value with an asterisk next to it), and now want to feel more confident doing statistics for themselves, and understanding what they are doing and why.

Likewise, it gives a more or less complete introduction to coding in R, assuming no prior familiarity. Again, you should supplement with other sources: the web provides wonderful free R materials, both for general skills and specific use cases. Besides, there are so many more things you can do in R than I have been able to cover here. The exact detail of the code presented here is anyway not so important. Other people will do the same thing in different ways; R itself changes; and contributed packages come and go. Besides, increasingly people will use artificial intelligence resources to suggest or debug their actual lines of code. What I want people to take away is more a sense of the overall logic of coding, and how to think clearly about what they are trying to do, and what you have done.

The book is also an introduction to principles of good study design and open science. It covers such matters as operationalization, pre-registration, computational reproducibility, open data, and sound causal inference. These topics are often considered as somewhat separate from the data analysis itself, but in my view they constitute an indivisible whole: you need to know how you are going to analyse your data before you gather them, and you can’t know this until you know what you are trying to infer, and how you might infer it, and evidence your conclusions, robustly and credibly.

If I had to summarise what I was aiming for in writing this book, I would compare its intent to that of the hand-books for the mechanical arts that were published by skilled artisans in early modern Europe. These are described by Lorraine Daston in her book Rules (Daston, 2022). You could find such hand-books for all kinds of skills: painting with perspective; mining; starting a pig farm; dredging a canal; dyeing; cookery; musical composition; surveying; and besieging a town.

These hand-books have some characteristic features, which seem applicable to the present book too. They cover an area which is not an exact science, but not a free art or unsystematic craft either. Rather, the area contains some exact underpinning principles, and rules of good practice that can be formulated and transmitted, but must be applied with judgement and discretion. The hand-books were written by experienced practitioners in the domain, often autodidacts, who had a yearning to coalesce and pass on the lessons they had taught themselves over the years. I am a working empirical researcher who does his own data analysis, rather than a statistical mathematician or computer scientist. I am sure this shows in the book, for better or worse.

The hand-books were, above all, designed for readers who wished to practice the skill for themselves. They were written with the intention that the consumer would alternate between reading on the page, and trying out the suggestions for themselves. That is true here. I have provided links to datasets from real studies, and code suggestions for how to go about analysing them. You will get more out of the chapters if you try out the code examples yourself, make sure you can get the results I demonstrate, and play around with variations. Also, you will make most rapid progress with data analysis once you have data of your own to analyse. That is when the art starts to seem most relevant, and developing the skill most gratifying. The point when you are doing data analysis on your own data is the point where you understand how many different options there are, and the obstacles you can come up against. It is the point where you are most likely to want to refer back a hand-book like this.

You can interact with this book as an online web version (https://bookdown.org/danielnettle2/data_analysis/), or the printed version. The web version has many advantages: it’s free, you can copy and paste snippets of code into your own R environment, and it contains hyperlinks for navigation between sections. On the other hand, if you want to have it on your shelf, carry it in your handbag, or read it on the beach, the printed copy is for you. There are almost no differences between the versions, and the layout of each is somewhat compromised for optimization of the other.

I would like to thank the amazing people who created R, RStudio, contributed R packages, bookdown, and the many accessible online reference and training sources for data analysis. All of these are available for free. They are part of an idealistic, indeed revolutionary, effort to improve human collective knowledge-building capacity without private profit. I would like to thank my students and collaborators over the years. Special thanks to Olha Guley, who co-taught a masters data analysis class with me at the ENS; and to Melissa Bateson. Melissa and I took our first R course together years ago in Newcastle, and have worked closely together on many data analyses since then.

References

Daston, L. (2022). Rules: A short history of what we live by. Princeton University Press.