From Questions to Knowledge
Preface
1
Building blocks of empirical science
1.1
Introduction
1.2
Phenomena, constructs and operationalization
1.3
Variables
1.3.1
Categorical variables
1.3.2
Quantitative variables
1.3.3
Ordinal variables
1.3.4
Measured versus manipulated variables
1.4
Types of study
1.4.1
Observational studies
1.4.2
Experimental studies
1.4.3
Quasi-experimental studies
1.5
Causality and causal identification
1.6
Types of experiment
1.6.1
Between-subjects manipulation
1.6.2
Within-subjects manipulation
1.6.3
Within or between ?
1.6.4
Study design
1.7
Theories, hypotheses and predictions
1.7.1
Exploratory and confirmatory research
1.8
How to do data analysis well enough
1.8.1
Analysis strategy and pre-registration
1.8.2
Reproducibility
1.9
Key concepts for this chapter
2
Introducing R and descriptive statistics
2.1
Introduction
2.2
What is R and why do we use it?
2.2.1
R and R Studio
2.2.2
R is a kind of anarchist utopia
2.2.3
Use R because you’re worth it
2.2.4
Working with R
2.2.5
Base R and contributed packages
2.3
Basics of R
2.3.1
The RStudio environment
2.3.2
Is anybody in?
2.3.3
Objects
2.3.4
Vectors as objects
2.3.5
Applying functions to objects
2.3.6
Classes
2.3.7
Assigning and logical checking
2.3.8
Installing and activating a contributed package
2.3.9
Scripts
2.3.10
Loading in data
2.4
The Behavioural Inhibition dataset
2.4.1
Getting the data
2.4.2
Sense checking the data
2.5
Descriptive statistics
2.5.1
Making some quick plots
2.6
Key concepts for this chapter
3
Inferential statistics and the General Linear Model
3.1
Introduction
3.2
Fundamentals of inferential statistics
3.2.1
Samples and populations
3.2.2
Models, parameter estimates, and confidence intervals
3.3
General Linear Models with a single predictor
3.3.1
Loading in the behavioural inhibition data, again
3.3.2
A first parameter estimate
3.3.3
Bringing in imprecision
3.3.4
A General Linear Model with a continuous predictor
3.4
General Linear Models with multiple predictors
3.4.1
A multiple-predictor model of behavioural inhibition
3.4.2
Which predictor variables should you include ?
3.5
Assumptions of the General Linear Model
3.5.1
Checking assumptions: Distributions of residuals
3.5.2
Checking assumptions: Homogeneity of variance
3.5.3
Transformations
3.6
Key concepts for this chapter
4
Statistical tests: Difference and equivalence
4.1
Introduction
4.2
Null hypothesis significance tests
4.2.1
What does a significant p-value actually mean?
4.2.2
False positives
4.2.3
False negatives
4.2.4
Standardized effect sizes
4.2.5
Reporting the results of models
4.3
Equivalence tests
4.3.1
The region of practical equivalence (ROPE)
4.3.2
Equivalence tests and null hypothesis significance tests
4.3.3
How to do equivalence tests
4.3.4
When should you report equivalence tests?
4.4
Key concepts for this chapter
5
Making figures, and models with interactions
5.1
Introduction
5.2
A new dataset: Fitouchi and Nettle (2025)
5.2.1
Background to the study
5.2.2
Getting the data
5.3
Figure 1: two experimental conditions
5.3.1
Refining your figure
5.3.2
Showing means and confidence intervals
5.3.3
Outputting your figure
5.4
Figure 2: A scatterplot
5.4.1
A first scatter plot
5.4.2
Mapping a variable to the colour of the points
5.4.3
Facetting
5.5
Models with interactions
5.5.1
Start with an additive model
5.5.2
Specifying and fitting an interactive model
5.5.3
An alternative way of specifying an interaction model
5.5.4
Interpreting the significance of main effects in models with interactions
5.6
When should you fit interactions?
5.6.1
Use interaction models to test for differences in association across levels of another variable
5.7
Key concepts for this chapter
6
Generalized Linear Models and Linear Mixed Models
6.1
Introduction
6.2
Generalized Linear Models: Theory
6.3
Generalized Linear Models: Empirical example
6.3.1
The “tomboy”study
6.3.2
Getting the data
6.3.3
Checking and a quick plot
6.3.4
Fitting the Generalized Linear Model
6.3.5
Interpreting the parameter estimate
6.3.6
Other families of Generalized Linear Model
6.4
Linear Mixed models: Theory
6.5
Linear Mixed Models: Empirical example
6.5.1
The study by Nettle and Saxe (2020) and its data
6.5.2
Getting the data
6.5.3
Wide versus long format
6.5.4
Fitting the Linear Mixed Model
6.5.5
More complex random effects structures
6.6
Generalized and Mixed?
6.7
Key concepts for this chapter
7
Analysis of Variance
7.1
Introduction
7.2
Problematic hypothesis tests using individual coefficients
7.2.1
Reload the Nettle and Saxe data
7.2.2
First problem: No single test for a variable with three levels
7.2.3
Second problem: Main effects are hard to interpret when the model contains interactions
7.3
What ANOVA is and how it saves the day
7.4
ANOVA for a General Linear Model
7.5
Using ANOVA: a summary
7.6
Key concepts for this chapter
8
Model selection and model averaging
8.1
Introduction
8.2
The Akaike Information Criterion
8.3
Using model selection to test between multiple non-null hypotheses
8.3.1
The Changing Cost of Living Study
8.3.2
Getting the data and the MuMIn package
8.3.3
Setting up your models and doing the selection
8.3.4
Model averaging
8.4
Model selection for exploratory analysis
8.4.1
Setting up the data and model selection
8.5
Key concepts for this chapter
9
Sensitivity analysis
9.1
Introduction
9.2
Running a specification curve analysis
9.2.1
Load in the tomboy data, again
9.2.2
Specifying your specifications
9.2.3
Running the analysis
9.2.4
So what do we conclude in this case?
9.3
The place of sensitivity in your analysis strategy
9.4
Key concepts for this chapter
10
Evidence Synthesis and Meta-analysis
10.1
Introduction
10.2
Why vote-counting is bad
10.3
Introducing meta-analysis
10.4
Meta-analysis of the literature
10.4.1
Working out what data to extract
10.4.2
Extracting the data
10.4.3
Calculating the effect sizes
10.4.4
Running the meta-analysis
10.4.5
Reporting the meta-analysis
10.4.6
Systematic reviewing
10.5
Internal meta-analysis
10.5.1
Preparing the internal meta-analysis
10.5.2
Running the internal meta-analysis
10.5.3
What to do with internal meta-analysis
10.6
Meta-regression
10.7
Key concepts from this chapter
11
Simulation and Statistical Power
11.1
Introduction
11.2
Basics of simulating datasets
11.3
Writing functions
11.3.1
The definition statement
11.3.2
Adding defaults
11.3.3
Returning to our simulation
11.3.4
Making functions that call your functions
11.4
Statistical power
11.5
Using power analysis
11.5.1
Analytical solutions for power
11.5.2
The main ways of using statistical power
11.5.3
Sample size determination
11.5.4
Power determination or minimal detectable effect calculation
11.6
Sample size justification
11.7
Key concepts for this chapter
12
Good practices for data analysis
12.1
Introduction
12.2
Principles of good data analysis
12.2.1
Pre-registration
12.2.2
Pre-printing and results-blind publication
12.2.3
Sensitivity analysis
12.2.4
Internal replication
12.2.5
Open data and code
12.3
Choosing an analysis strategy
12.3.1
Experimental studies
12.3.2
Observational studies
12.4
Trimming your analysis strategy
12.4.1
Too many questions
12.4.2
Too many versions of the DV
12.4.3
Too many models
12.5
Being a good data citizen
12.5.1
Use transparent and consistent naming conventions
12.5.2
Section and comment your script
12.5.3
Make your scripts modular with a head
12.5.4
Consider a separate data wrangling script
12.5.5
Consider using R Markdown
12.5.6
Maximize local autonomy and minimize intermediate objects
12.6
Key concepts for this chapter
References
Published with bookdown
From Questions to Knowledge: Data Analysis for Psychology and Behavioural Science using R
Chapter 1
Building blocks of empirical science