18 Moderation

Moderation analysis is essential for understanding interaction effects—when the relationship between two variables depends on the value of a third. This chapter introduces the concept through real-world examples. After outlining common types of moderation (binary, continuous, hierarchical), the chapter walks through the key terminology, including moderators, focal predictors, and conditional effects. It covers the classic moderation model and introduces interaction terms in regression. Later sections delve into two-way and three-way interactions, providing detailed guidance on specification, estimation, and interpretation. Graphical methods for exploring interaction effects are emphasized, using interaction plots and marginal effects visualization. The chapter ensures readers are able not only to model interaction effects correctly but to communicate them clearly to non-technical stakeholders.


Moderation analysis examines how the relationship between an independent variable (\(X\)) and a dependent variable (\(Y\)) changes depending on a third variable, the moderator (\(M\)). In regression terms, moderation is represented as an interaction effect.

18.1 Types of Moderation Analyses

There are two primary approaches to analyzing moderation:

1. Spotlight Analysis

  • Also known as Simple Slopes Analysis.
  • Examines the effect of \(X\) on \(Y\) at specific values of \(M\) (e.g., mean, \(\pm 1\) SD, percentiles).
  • Typically used for categorical or discretized moderators.

2. Floodlight Analysis

  • Extends spotlight analysis to examine moderation across the entire range of \(M\).
  • Based on Johnson-Neyman Intervals, identifying values of \(M\) where the effect of \(X\) on \(Y\) is statistically significant.
  • Useful when the moderator is continuous and no specific cutoffs are predefined.

18.2 Key Terminology

  • Main Effect: The effect of an independent variable without considering interactions.
  • Interaction Effect: The combined effect of \(X\) and \(M\) on \(Y\).
  • Simple Slope: The slope of \(X\) on \(Y\) at a specific value of \(M\) (used when \(M\) is continuous).
  • Simple Effect: The effect of \(X\) on \(Y\) at a particular level of \(M\) when \(X\) is categorical.

18.3 Moderation Model

A typical moderation model is represented as:

\[ Y = \beta_0 + \beta_1 X + \beta_2 M + \beta_3 X \times M + \varepsilon \]

where:

  • \(\beta_0\): Intercept

  • \(\beta_1\): Main effect of \(X\)

  • \(\beta_2\): Main effect of \(M\)

  • \(\beta_3\): Interaction effect of \(X\) and \(M\)

If \(\beta_3\) is significant, it suggests that the effect of \(X\) on \(Y\) depends on \(M\).

18.4 Types of Interactions

  1. Continuous by Continuous: Both \(X\) and \(M\) are continuous (e.g., age moderating the effect of income on spending).
  2. Continuous by Categorical: \(X\) is continuous, and \(M\) is categorical (e.g., gender moderating the effect of education on salary).
  3. Categorical by Categorical: Both \(X\) and \(M\) are categorical (e.g., the effect of a training program on performance, moderated by job role).

18.5 Three-Way Interactions

For models with a second moderator (\(W\)), we examine:

\[ \begin{aligned} Y &= \beta_0 + \beta_1 X + \beta_2 M + \beta_3 W + \beta_4 X \times M \\ &+ \beta_5 X \times W + \beta_6 M \times W + \beta_7 X \times M \times W + \varepsilon \end{aligned} \]

18.6 Additional Resources

  • Bayesian ANOVA models: BANOVAL package allows floodlight analysis.
  • Structural Equation Modeling: cSEM package includes doFloodlightAnalysis.

For more details, refer to (Spiller et al. 2013).

18.7 Application

18.7.1 emmeans Package

The emmeans package (Estimated Marginal Means) is a powerful tool for post-hoc analysis of linear models, enabling researchers to explore interaction effects through simple slopes and estimated marginal means.

To install and load the package:

install.packages("emmeans")

The dataset used in this section is sourced from the UCLA Statistical Consulting Group, where:

  • gender (male, female) and prog (exercise program: jogging, swimming, reading) are categorical variables.

  • loss represents weight loss, and hours and effort are continuous predictors.

library(tidyverse)
dat <- readRDS("data/exercise.rds") %>%
    mutate(prog = factor(prog, labels = c("jog", "swim", "read"))) %>%
    mutate(gender = factor(gender, labels = c("male", "female")))

18.7.1.1 Continuous by Continuous Interaction

We begin with an interaction model between two continuous variables: hours (exercise duration) and effort (self-reported effort level).

contcont <- lm(loss ~ hours * effort, data = dat)
summary(contcont)
#> 
#> Call:
#> lm(formula = loss ~ hours * effort, data = dat)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#> -29.52 -10.60  -1.78  11.13  34.51 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)   7.79864   11.60362   0.672   0.5017  
#> hours        -9.37568    5.66392  -1.655   0.0982 .
#> effort       -0.08028    0.38465  -0.209   0.8347  
#> hours:effort  0.39335    0.18750   2.098   0.0362 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 13.56 on 896 degrees of freedom
#> Multiple R-squared:  0.07818,    Adjusted R-squared:  0.07509 
#> F-statistic: 25.33 on 3 and 896 DF,  p-value: 9.826e-16
18.7.1.1.1 Simple Slopes Analysis (Spotlight Analysis)

Following Aiken and West (2005), the spotlight analysis examines the effect of hours on loss at three levels of effort:

  • Mean of effort plus one standard deviation

  • Mean of effort

  • Mean of effort minus one standard deviation

library(emmeans)
effar <- round(mean(dat$effort) + sd(dat$effort), 1)
effr  <- round(mean(dat$effort), 1)
effbr <- round(mean(dat$effort) - sd(dat$effort), 1)



# Define values for estimation
mylist <- list(effort = c(effbr, effr, effar))

# Compute simple slopes
emtrends(contcont, ~ effort, var = "hours", at = mylist)
#>  effort hours.trend    SE  df lower.CL upper.CL
#>    24.5       0.261 1.350 896   -2.392     2.91
#>    29.7       2.307 0.915 896    0.511     4.10
#>    34.8       4.313 1.310 896    1.745     6.88
#> 
#> Confidence level used: 0.95

# Visualization of the interaction
mylist <- list(hours = seq(0, 4, by = 0.4),
               effort = c(effbr, effr, effar))
emmip(contcont, effort ~ hours, at = mylist, CIs = TRUE)
# Test statistical differences in slopes
emtrends(
    contcont,
    pairwise ~ effort,
    var = "hours",
    at = mylist,
    adjust = "none"
)
#> $emtrends
#>  effort hours.trend    SE  df lower.CL upper.CL
#>    24.5       0.261 1.350 896   -2.392     2.91
#>    29.7       2.307 0.915 896    0.511     4.10
#>    34.8       4.313 1.310 896    1.745     6.88
#> 
#> Results are averaged over the levels of: hours 
#> Confidence level used: 0.95 
#> 
#> $contrasts
#>  contrast                estimate    SE  df t.ratio p.value
#>  effort24.5 - effort29.7    -2.05 0.975 896  -2.098  0.0362
#>  effort24.5 - effort34.8    -4.05 1.930 896  -2.098  0.0362
#>  effort29.7 - effort34.8    -2.01 0.956 896  -2.098  0.0362
#> 
#> Results are averaged over the levels of: hours

The three p-values obtained above correspond to the interaction term in the regression model.

For a professional figure, we refine the visualization using ggplot2:

library(ggplot2)

# Prepare data for plotting
mylist <- list(hours = seq(0, 4, by = 0.4),
               effort = c(effbr, effr, effar))
contcontdat <-
    emmip(contcont,
          effort ~ hours,
          at = mylist,
          CIs = TRUE,
          plotit = FALSE)

# Convert effort levels to factors
contcontdat$feffort <- factor(contcontdat$effort)
levels(contcontdat$feffort) <- c("low", "medium", "high")
ggplot(data = contcontdat,
       aes(x = hours, y = yvar, color = feffort)) +
    geom_line() +
    geom_ribbon(aes(ymax = UCL, ymin = LCL, fill = feffort),
                alpha = 0.4) + labs(x = "Exercise Hours",
                                    y = "Weight Loss",
                                    color = "Effort",
                                    fill = "Effort Level") +
    theme_minimal()
Plot showing weight loss over exercise hours with varying effort levels. The x-axis represents exercise hours from 0 to 4, and the y-axis represents weight loss. Three colored lines indicate effort levels: red for low, green for medium, and blue for high. The blue line has the steepest positive slope. The chart shows that higher effort levels correlate with greater weight loss over time. A legend on the right labels the effort levels.

Figure 2.7: Weight Loss vs Exercise Hours

18.7.1.2 Continuous by Categorical Interaction

Next, we examine an interaction where hours (continuous) interacts with gender (categorical). We set “Female” as the reference category:

dat$gender <- relevel(dat$gender, ref = "female")
contcat <- lm(loss ~ hours * gender, data = dat)
summary(contcat)
#> 
#> Call:
#> lm(formula = loss ~ hours * gender, data = dat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -27.118 -11.350  -1.963  10.001  42.376 
#> 
#> Coefficients:
#>                  Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)         3.335      2.731   1.221    0.222  
#> hours               3.315      1.332   2.489    0.013 *
#> gendermale          3.571      3.915   0.912    0.362  
#> hours:gendermale   -1.724      1.898  -0.908    0.364  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 14.06 on 896 degrees of freedom
#> Multiple R-squared:  0.008433,   Adjusted R-squared:  0.005113 
#> F-statistic:  2.54 on 3 and 896 DF,  p-value: 0.05523

Simple Slopes by Gender

# Compute simple slopes for each gender
emtrends(contcat, ~ gender, var = "hours")
#>  gender hours.trend   SE  df lower.CL upper.CL
#>  female        3.32 1.33 896    0.702     5.93
#>  male          1.59 1.35 896   -1.063     4.25
#> 
#> Confidence level used: 0.95

# Test slope differences
emtrends(contcat, pairwise ~ gender, var = "hours")
#> $emtrends
#>  gender hours.trend   SE  df lower.CL upper.CL
#>  female        3.32 1.33 896    0.702     5.93
#>  male          1.59 1.35 896   -1.063     4.25
#> 
#> Confidence level used: 0.95 
#> 
#> $contrasts
#>  contrast      estimate  SE  df t.ratio p.value
#>  female - male     1.72 1.9 896   0.908  0.3639

Since this test is equivalent to the interaction term in the regression model, a significant result confirms a moderating effect of gender.

mylist <- list(hours = seq(0, 4, by = 0.4),
               gender = c("female", "male"))
emmip(contcat, gender ~ hours, at = mylist, CIs = TRUE)
<img src=“18-moderation_files/figure-html/unnamed-chunk-11-1.png” alt=“Line chart showing linear predictions over time in hours, with separate lines for two groups labeled as”female” and “male.” The x-axis represents hours, ranging from 0 to 4, and the y-axis represents linear prediction values, ranging from 0 to 20. The red line represents females, and the blue line represents males. Both lines show an upward trend, with the female group having a steeper slope. Error bars are present for each data point. A legend on the right indicates the color coding for each group.” width=“90%” />

Figure 2.12: Linear prediction vs hours

18.7.1.3 Categorical by Categorical Interaction

Now, we examine the interaction between two categorical variables: gender (male, female) and prog (exercise program). We set “Read” as the reference category for prog and “Female” for gender:

dat$prog   <- relevel(dat$prog, ref = "read")
dat$gender <- relevel(dat$gender, ref = "female")

catcat <- lm(loss ~ gender * prog, data = dat)
summary(catcat)
#> 
#> Call:
#> lm(formula = loss ~ gender * prog, data = dat)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -19.1723  -4.1894  -0.0994   3.7506  27.6939 
#> 
#> Coefficients:
#>                     Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)          -3.6201     0.5322  -6.802 1.89e-11 ***
#> gendermale           -0.3355     0.7527  -0.446    0.656    
#> progjog               7.9088     0.7527  10.507  < 2e-16 ***
#> progswim             32.7378     0.7527  43.494  < 2e-16 ***
#> gendermale:progjog    7.8188     1.0645   7.345 4.63e-13 ***
#> gendermale:progswim  -6.2599     1.0645  -5.881 5.77e-09 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 6.519 on 894 degrees of freedom
#> Multiple R-squared:  0.7875, Adjusted R-squared:  0.7863 
#> F-statistic: 662.5 on 5 and 894 DF,  p-value: < 2.2e-16

Simple Effects and Contrast Analysis

# Estimated marginal means for all combinations of gender and program
emcatcat <- emmeans(catcat, ~ gender * prog)

# Compare effects of gender within each program
contrast(emcatcat, "revpairwise", by = "prog", adjust = "bonferroni")
#> prog = read:
#>  contrast      estimate    SE  df t.ratio p.value
#>  male - female   -0.335 0.753 894  -0.446  0.6559
#> 
#> prog = jog:
#>  contrast      estimate    SE  df t.ratio p.value
#>  male - female    7.483 0.753 894   9.942  <.0001
#> 
#> prog = swim:
#>  contrast      estimate    SE  df t.ratio p.value
#>  male - female   -6.595 0.753 894  -8.762  <.0001
emmip(catcat, prog ~ gender, CIs = TRUE)
Plot showing linear predictions based on gender levels for three activities: reading, jogging, and swimming. The x-axis represents gender levels, labeled as female and male, while the y-axis shows linear prediction values. The chart includes three lines: red for reading, green for jogging, and blue for swimming. The legend on the right identifies the activities by color.

Figure 2.15: Linear prediction vs Gender

For a more intuitive presentation, we use a bar graph with error bars

# Prepare data
catcatdat <- emmip(catcat,
                   gender ~ prog,
                   CIs = TRUE,
                   plotit = FALSE)
ggplot(data = catcatdat,
       aes(x = prog, y = yvar, fill = gender)) +
    geom_bar(stat = "identity", position = "dodge") + geom_errorbar(
        position = position_dodge(.9),
        width = .25,
        aes(ymax = UCL, ymin = LCL),
        alpha = 0.3
    )  + labs(x = "Exercise Program",
              y = "Weight Loss",
              fill = "Gender")
Bar chart showing weight loss across different exercise programs: reading, jogging, and swimming, categorized by gender. The y-axis represents weight loss, and the x-axis lists exercise programs. Females, represented by red bars, show the highest weight loss in swimming, followed by jogging and reading. Males, shown in blue bars, also have the highest weight loss in swimming, with jogging and reading following. Error bars indicate variability in data. A legend on the right identifies the colors for each gender.

Figure 2.17: Weigth Loss vs Exercise Program

18.7.2 probemod Package

The probemod package is designed for moderation analysis, particularly focusing on Johnson-Neyman intervals and simple slopes analysis. However, this package is not recommended due to known issues with subscript handling and formatting errors in some outputs.

install.packages("probemod", dependencies = T)

The Johnson-Neyman technique identifies values of the moderator (gender) where the effect of the independent variable (hours) on the dependent variable (loss) is statistically significant. This method is particularly useful when the moderator is continuous but can also be applied to categorical moderators.

Example: J-N Analysis in a loss ~ hours * gender Model

library(probemod)

myModel <-
    lm(loss ~ hours * gender, data = dat %>% 
           select(loss, hours, gender))

jnresults <- jn(myModel,
                dv = 'loss',
                iv = 'hours',
                mod = 'gender')

The jn() function computes Johnson-Neyman intervals, highlighting the values of gender at which the relationship between hours and loss is statistically significant.

The Pick-a-Point method tests the simple effect of hours at specific values of gender, akin to spotlight analysis.

pickapoint(
    myModel,
    dv = 'loss',
    iv = 'hours',
    mod = 'gender',
    alpha = .01
)

plot(jnresults)

18.7.3 interactions Package

The interactions package is a recommended tool for visualizing and interpreting interaction effects in regression models. It provides user-friendly functions for interaction plots, simple slopes analysis, and Johnson-Neyman intervals, making it an excellent choice for moderation analysis.

install.packages("interactions")

18.7.3.1 Continuous by Continuous Interaction

This section covers interactions where at least one of the two variables is continuous.

Example: Interaction Between Illiteracy and Murder

We use the state.x77 dataset to explore how Illiteracy Rate and Murder Rate interact to predict Income across U.S. states.

states <- as.data.frame(state.x77)
fiti <- lm(Income ~ Illiteracy * Murder + `HS Grad`, data = states)
summary(fiti)
#> 
#> Call:
#> lm(formula = Income ~ Illiteracy * Murder + `HS Grad`, data = states)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -916.27 -244.42   28.42  228.14 1221.16 
#> 
#> Coefficients:
#>                   Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)        1414.46     737.84   1.917  0.06160 .  
#> Illiteracy          753.07     385.90   1.951  0.05724 .  
#> Murder              130.60      44.67   2.923  0.00540 ** 
#> `HS Grad`            40.76      10.92   3.733  0.00053 ***
#> Illiteracy:Murder   -97.04      35.86  -2.706  0.00958 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 459.5 on 45 degrees of freedom
#> Multiple R-squared:  0.4864, Adjusted R-squared:  0.4407 
#> F-statistic: 10.65 on 4 and 45 DF,  p-value: 3.689e-06

For continuous moderators, the standard values chosen for visualization are:

  • Mean + 1 SD

  • Mean

  • Mean - 1 SD

The interact_plot() function provides an easy way to visualize these effects.

library(interactions)
interact_plot(fiti,
              pred = Illiteracy,
              modx = Murder,
              
              # Disable automatic mean-centering
              centered = "none", 
              
              # Exclude the mean value of the moderator
              # modx.values = "plus-minus", 
              
              # Divide the moderator's distribution into three groups
              # modx.values = "terciles", 
              
              plot.points = TRUE, # Overlay raw data
              
              # Different shapes for different levels of the moderator
              point.shape = TRUE, 
              
              # Jittering to prevent overplotting
              jitter = 0.1, 
              
              # Custom appearance
              x.label = "Illiteracy Rate (%)", 
              y.label = "Income ($)", 
              main.title = "Interaction Between Illiteracy and Murder Rate",
              legend.main = "Murder Rate Levels",
              colors = "blue",
              
              # Confidence bands
              interval = TRUE, 
              int.width = 0.9, 
              robust = TRUE # Use robust standard errors
              ) 
<img src=“18-moderation_files/figure-html/unnamed-chunk-22-1.png” alt=“Scatter plot titled”Interaction Between Illiteracy and Murder Rate” showing the relationship between illiteracy rate in percentage on the x-axis and income in US dollars on the y-axis. Data points are scattered with trend lines for each murder rate level: solid line for plus 1 standard deviation, dashed line for mean, and dotted line for minus 1 standard deviation. Shaded areas represent confidence intervals.” width=“90%” />

Figure 2.22: Interaction Between Illiteracy and Murder Rate

If the model includes weights, they can be incorporated into the visualization.

fiti <- lm(Income ~ Illiteracy * Murder,
           data = states,
           weights = Population)
interact_plot(fiti,
              pred = Illiteracy,
              modx = Murder,
              plot.points = TRUE)
<img src=“18-moderation_files/figure-html/unnamed-chunk-24-1.png” alt=“Bubble chart depicting the relationship between illiteracy and income, with bubble sizes representing data points. The chart includes three trend lines for murder rate level:”plus 1 standard deviation,” “Mean,” and “minus 1 standard deviation.” The x-axis represents illiteracy, and the y-axis represents income, with values ranging from 3000 to 6000. All trend lines have negative slopes, with the trend line for murder rate level of plus one standard deviation above the mean having the steepest slope.” width=“90%” />

Figure 10.12: Bubble Chart between Income and Illiteracy

A partial effect plot shows how the effect of one variable changes across different levels of another variable while controlling for other predictors.

library(ggplot2)
data(cars)

fitc <- lm(cty ~ year + cyl * displ + class + fl + drv, 
           data = mpg)
summary(fitc)
#> 
#> Call:
#> lm(formula = cty ~ year + cyl * displ + class + fl + drv, data = mpg)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -5.9772 -0.7164  0.0018  0.7690  5.9314 
#> 
#> Coefficients:
#>                   Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)     -200.97599   47.00954  -4.275 2.86e-05 ***
#> year               0.11813    0.02347   5.033 1.01e-06 ***
#> cyl               -1.85648    0.27745  -6.691 1.86e-10 ***
#> displ             -3.56467    0.65943  -5.406 1.70e-07 ***
#> classcompact      -2.60177    0.92972  -2.798 0.005597 ** 
#> classmidsize      -2.62996    0.93273  -2.820 0.005253 ** 
#> classminivan      -4.40817    1.03844  -4.245 3.24e-05 ***
#> classpickup       -4.37322    0.93416  -4.681 5.02e-06 ***
#> classsubcompact   -2.38384    0.92943  -2.565 0.010997 *  
#> classsuv          -4.27352    0.86777  -4.925 1.67e-06 ***
#> fld                6.34343    1.69499   3.742 0.000233 ***
#> fle               -4.57060    1.65992  -2.754 0.006396 ** 
#> flp               -1.91733    1.58649  -1.209 0.228158    
#> flr               -0.78873    1.56551  -0.504 0.614901    
#> drvf               1.39617    0.39660   3.520 0.000525 ***
#> drvr               0.48740    0.46113   1.057 0.291694    
#> cyl:displ          0.36206    0.07934   4.564 8.42e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.526 on 217 degrees of freedom
#> Multiple R-squared:  0.8803, Adjusted R-squared:  0.8715 
#> F-statistic: 99.73 on 16 and 217 DF,  p-value: < 2.2e-16
interact_plot(
    fitc,
    pred = displ,
    modx = cyl,
    
    # Show observed data as partial residuals
    partial.residuals = TRUE, 
    
    # Specify moderator values manually
    modx.values = c(4, 5, 6, 8)
)
Scatter plot showing the relationship between engine displacement (displ) on the x-axis and city fuel efficiency (cty) on the y-axis. Data points are color-coded by the number of cylinders (cyl), with a legend indicating line styles for 4, 5, 6, and 8 cylinders. Trend lines for each cylinder category show a general downward trend, indicating that as displacement increases, city fuel efficiency decreases.

Figure 10.14: Scatter Plot between cty and displ

To check whether an interaction is truly linear, we can compare fitted lines based on:

  • The whole sample (black line)

  • Subsamples based on the moderator (red line)

# Generate synthetic data
x_2 <- runif(n = 200, min = -3, max = 3)
w   <- rbinom(n = 200, size = 1, prob = 0.5)
err <- rnorm(n = 200, mean = 0, sd = 4)
y_2 <- 2.5 - x_2 ^ 2 - 5 * w + 2 * w * (x_2 ^ 2) + err

data_2 <- as.data.frame(cbind(x_2, y_2, w))

# Fit interaction model
model_2 <- lm(y_2 ~ x_2 * w, data = data_2)
summary(model_2)
#> 
#> Call:
#> lm(formula = y_2 ~ x_2 * w, data = data_2)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -13.7445  -3.2845   0.0759   3.5137  14.4693 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept)  0.03405    0.52038   0.065    0.948
#> x_2          0.04027    0.31969   0.126    0.900
#> w            0.94696    0.72213   1.311    0.191
#> x_2:w        0.03622    0.42141   0.086    0.932
#> 
#> Residual standard error: 5.098 on 196 degrees of freedom
#> Multiple R-squared:  0.009305,   Adjusted R-squared:  -0.005858 
#> F-statistic: 0.6137 on 3 and 196 DF,  p-value: 0.6069
# Linearity check plot
interact_plot(
    model_2,
    pred = x_2,
    modx = w,
    linearity.check = TRUE,
    plot.points = TRUE
)
<img src=“18-moderation_files/figure-html/unnamed-chunk-28-1.png” alt=“Two X-Y charts comparing data distributions with different weights. The left chart, labeled”w equals 1,” shows a scatter plot with a red parabolic curve and a black linear trend line. The right chart, labeled “w equals 0,” displays a similar scatter plot with a red inverted parabolic curve and a black linear trend line. Both charts have axes labeled “x2” and “y2,” with data points scattered around the curves and lines.” width=“90%” />

Figure 10.16: Interaction Plot

18.7.3.1.1 Simple Slopes Analysis

A simple slopes analysis examines the conditional effect of an independent variable (\(X\)) at specific levels of the moderator (\(M\)).

How sim_slopes() Works:

  • Continuous moderators: Analyzes effects at the mean and ±1 SD.

  • Categorical moderators: Uses all factor levels.

  • Mean-centers all variables except the predictor of interest.

Example: Continuous by Continuous Interaction

sim_slopes(fiti,
           pred = Illiteracy,
           modx = Murder,
           johnson_neyman = FALSE)
#> SIMPLE SLOPES ANALYSIS
#> 
#> Slope of Illiteracy when Murder =  5.420973 (- 1 SD): 
#> 
#>     Est.     S.E.   t val.      p
#> -------- -------- -------- ------
#>   -17.43   250.08    -0.07   0.94
#> 
#> Slope of Illiteracy when Murder =  8.685043 (Mean): 
#> 
#>      Est.     S.E.   t val.      p
#> --------- -------- -------- ------
#>   -399.64   178.86    -2.23   0.03
#> 
#> Slope of Illiteracy when Murder = 11.949113 (+ 1 SD): 
#> 
#>      Est.     S.E.   t val.      p
#> --------- -------- -------- ------
#>   -781.85   189.11    -4.13   0.00

We can also visualize the simple slopes

# Store results
ss <- sim_slopes(fiti,
                 pred = Illiteracy,
                 modx = Murder,
                 modx.values = c(0, 5, 10))
plot(ss)
Chart displaying the relationship between the slope of illiteracy and murder rates at three levels: 0.00, 5.00, and 10.00. Each level is represented by a horizontal line with a central point, indicating the slope’s value range. The x-axis is labeled “Slope of Illiteracy,” ranging from -1000 to 1500.

Figure 18.1: Slope of Illiteracy

For publication-quality results, we convert the simple slopes analysis into a table using huxtable.

library(huxtable)

ss <- sim_slopes(fiti,
                 pred = Illiteracy,
                 modx = Murder,
                 modx.values = c(0, 5, 10))

# Convert to a formatted table
as_huxtable(ss)
Value of Murder Slope of Illiteracy
0.00 617.34 (434.85)
5.00 31.86 (262.63)
10.00 -553.62 (171.42)**
18.7.3.1.2 Johnson-Neyman Intervals

The Johnson-Neyman technique identifies the range of the moderator (\(M\)) where the effect of the predictor (\(X\)) on the dependent variable (\(Y\)) is statistically significant. This approach is useful when the moderator is continuous, allowing us to determine where an effect exists rather than arbitrarily choosing values.

Although the J-N approach has been widely used (P. O. Johnson and Neyman 1936), it has known inflated Type I error rates (Bauer and Curran 2005). A correction method was proposed by (Esarey and Sumner 2018) to address these issues.

Since J-N performs multiple comparisons across all values of the moderator, it inflates Type I error. To control for this, we use False Discovery Rate correction.

Example: Johnson-Neyman Analysis

sim_slopes(
    fiti,
    pred = Illiteracy,
    modx = Murder,
    johnson_neyman = TRUE,
    control.fdr = TRUE,  # Correction for Type I and II errors
    
    # Include conditional intercepts
    # cond.int = TRUE, 
    
    robust = "HC3",  # Use robust SE
    
    # Don't mean-center non-focal variables
    # centered = "none",
    
    jnalpha = 0.05  # Significance level
)
#> JOHNSON-NEYMAN INTERVAL
#> 
#> When Murder is OUTSIDE the interval [-7.87, 8.51], the slope of Illiteracy
#> is p < .05.
#> 
#> Note: The range of observed values of Murder is [1.40, 15.10]
#> 
#> Interval calculated using false discovery rate adjusted t = 2.35 
#> 
#> SIMPLE SLOPES ANALYSIS
#> 
#> Slope of Illiteracy when Murder =  5.420973 (- 1 SD): 
#> 
#>     Est.     S.E.   t val.      p
#> -------- -------- -------- ------
#>   -17.43   227.37    -0.08   0.94
#> 
#> Slope of Illiteracy when Murder =  8.685043 (Mean): 
#> 
#>      Est.     S.E.   t val.      p
#> --------- -------- -------- ------
#>   -399.64   158.77    -2.52   0.02
#> 
#> Slope of Illiteracy when Murder = 11.949113 (+ 1 SD): 
#> 
#>      Est.     S.E.   t val.      p
#> --------- -------- -------- ------
#>   -781.85   156.96    -4.98   0.00

To visualize the J-N intervals

johnson_neyman(fiti,
               pred = Illiteracy,
               modx = Murder,
               control.fdr = TRUE, # Corrects for Type I error
               alpha = .05)
#> JOHNSON-NEYMAN INTERVAL
#> 
#> When Murder is OUTSIDE the interval [-22.57, 8.52], the slope of Illiteracy
#> is p < .05.
#> 
#> Note: The range of observed values of Murder is [1.40, 15.10]
#> 
#> Interval calculated using false discovery rate adjusted t = 2.33
Johnson–Neyman plot illustrating the relationship between the slope of illiteracy and murder rates. The x-axis represents murder rates, while the y-axis shows the slope of illiteracy. A red line with a shaded area indicates non-significant results (n.s.), and a blue line with a shaded area indicates significant results (p is smaller than 0.05). A black horizontal line represents the range of observed data.
  • The y-axis represents the conditional slope of the predictor (\(X\)).
  • The x-axis represents the values of the moderator (\(M\)).

  • The shaded region represents the range where the effect of \(X\) on \(Y\) is statistically significant.

18.7.3.1.3 Three-Way Interactions

In three-way interactions, the effect of \(X\) on \(Y\) depends on two moderators (\(M_1\) and \(M_2\)). This allows for a more nuanced understanding of moderation effects.

Example: 3-Way Interaction Visualization

library(jtools)
# Convert 'cyl' to factor
mtcars$cyl <- factor(mtcars$cyl,
                     labels = c("4 cylinder", "6 cylinder", "8 cylinder"))

# Fit the model
fitc3 <- lm(mpg ~ hp * wt * cyl, data = mtcars)
interact_plot(fitc3,
              pred = hp,
              modx = wt,
              mod2 = cyl) +
    theme_apa(legend.pos = "bottomright")
Three-panel X-Y chart illustrating the Johnson-Neyman technique for three-way interaction. Each panel represents different cylinder counts: 4, 6, and 8. The x-axis shows horsepower (hp), and the y-axis shows miles per gallon (mpg). Lines represent different standard deviations: solid for plus 1 standard deviation, dashed for mean, and dotted for minus 1 standard deviation. The chart highlights how mpg varies with hp across different cylinder counts.

Figure 18.2: Interaction Plot

18.7.3.1.4 Johnson-Neyman for Three-Way Interaction

The Johnson-Neyman technique can also be applied in a three-way interaction context

library(survey)
data(api)

# Define survey design
dstrat <- svydesign(
    id = ~ 1,
    strata = ~ stype,
    weights = ~ pw,
    data = apistrat,
    fpc = ~ fpc
)

# Fit 3-way interaction model
regmodel3 <-
    survey::svyglm(api00 ~ avg.ed * growth * enroll, design = dstrat)
# Johnson-Neyman analysis with visualization
sim_slopes(
    regmodel3,
    pred = growth,
    modx = avg.ed,
    mod2 = enroll,
    jnplot = TRUE
)
#> ███████████████ While enroll (2nd moderator) =  153.0518 (- 1 SD) ██████████████ 
#> 
#> JOHNSON-NEYMAN INTERVAL
#> 
#> When avg.ed is OUTSIDE the interval [2.75, 3.82], the slope of growth is p
#> < .05.
#> 
#> Note: The range of observed values of avg.ed is [1.38, 4.44]
#> 
#> SIMPLE SLOPES ANALYSIS
#> 
#> Slope of growth when avg.ed = 2.085002 (- 1 SD): 
#> 
#>   Est.   S.E.   t val.      p
#> ------ ------ -------- ------
#>   1.25   0.32     3.86   0.00
#> 
#> Slope of growth when avg.ed = 2.787381 (Mean): 
#> 
#>   Est.   S.E.   t val.      p
#> ------ ------ -------- ------
#>   0.39   0.22     1.75   0.08
#> 
#> Slope of growth when avg.ed = 3.489761 (+ 1 SD): 
#> 
#>    Est.   S.E.   t val.      p
#> ------- ------ -------- ------
#>   -0.48   0.35    -1.37   0.17
#> 
#> ████████████████ While enroll (2nd moderator) =  595.2821 (Mean) ███████████████ 
#> 
#> JOHNSON-NEYMAN INTERVAL
#> 
#> When avg.ed is OUTSIDE the interval [2.84, 7.83], the slope of growth is p
#> < .05.
#> 
#> Note: The range of observed values of avg.ed is [1.38, 4.44]
#> 
#> SIMPLE SLOPES ANALYSIS
#> 
#> Slope of growth when avg.ed = 2.085002 (- 1 SD): 
#> 
#>   Est.   S.E.   t val.      p
#> ------ ------ -------- ------
#>   0.72   0.22     3.29   0.00
#> 
#> Slope of growth when avg.ed = 2.787381 (Mean): 
#> 
#>   Est.   S.E.   t val.      p
#> ------ ------ -------- ------
#>   0.34   0.16     2.16   0.03
#> 
#> Slope of growth when avg.ed = 3.489761 (+ 1 SD): 
#> 
#>    Est.   S.E.   t val.      p
#> ------- ------ -------- ------
#>   -0.04   0.24    -0.16   0.87
#> 
#> ███████████████ While enroll (2nd moderator) = 1037.5125 (+ 1 SD) ██████████████ 
#> 
#> JOHNSON-NEYMAN INTERVAL
#> 
#> The Johnson-Neyman interval could not be found. Is the p value for your
#> interaction term below the specified alpha?
#> 
#> SIMPLE SLOPES ANALYSIS
#> 
#> Slope of growth when avg.ed = 2.085002 (- 1 SD): 
#> 
#>   Est.   S.E.   t val.      p
#> ------ ------ -------- ------
#>   0.18   0.31     0.58   0.56
#> 
#> Slope of growth when avg.ed = 2.787381 (Mean): 
#> 
#>   Est.   S.E.   t val.      p
#> ------ ------ -------- ------
#>   0.29   0.20     1.49   0.14
#> 
#> Slope of growth when avg.ed = 3.489761 (+ 1 SD): 
#> 
#>   Est.   S.E.   t val.      p
#> ------ ------ -------- ------
#>   0.40   0.27     1.49   0.14
<img src=“18-moderation_files/figure-html/unnamed-chunk-37-1.png” alt=“Three X-Y charts showing the relationship between”avg.ed” (average education) on the x-axis and “Slope of growth” on the y-axis, with different enrollment levels. The top left chart represents enroll level equaling minus 1 standard deviation with a downward trend and shaded confidence interval. The top right chart shows enroll level equaling the mean with a similar downward trend. The bottom chart illustrates enroll level equaling plus 1 standard deviation with an upward trend. Each chart includes a horizontal line at zero and vertical dashed lines for reference.” width=“90%” />

Figure 13.3: Slope of Growth

To present the results in a publication-ready format, we generate tables and plots

ss3 <-
    sim_slopes(regmodel3,
               pred = growth,
               modx = avg.ed,
               mod2 = enroll)
# Plot results
plot(ss3)
<img src=“18-moderation_files/figure-html/unnamed-chunk-39-1.png” alt=“X-Y chart displaying the relationship between average education levels (avg.ed) and slope of growth for different enrollment values. Three horizontal lines represent different avg.ed values: 2.09, 2.79, and 3.49. Each line has circular markers indicating data points. Enrollment values are labeled as 1037.51, 153.05, and 595.28. The x-axis is labeled”Slope of growth,” ranging from -1 to 2.” width=“90%” />

Figure 18.3: Slope of Growth

# Convert results into a formatted table
library(huxtable)
as_huxtable(ss3)
enroll = 153
Value of avg.ed Slope of growth
2.09 1.25 (0.32)***
2.79 0.39 (0.22)#
3.49 -0.48 (0.35)
enroll = 595.28
Value of avg.ed Slope of growth
2.09 0.72 (0.22)**
2.79 0.34 (0.16)*
3.49 -0.04 (0.24)
enroll = 1037.51
Value of avg.ed Slope of growth
2.09 0.18 (0.31)
2.79 0.29 (0.20)
3.49 0.40 (0.27)

18.7.3.2 Categorical Interactions

Interactions between categorical predictors can be visualized using categorical plots.

Example: Interaction Between cyl, fwd, and auto

library(ggplot2)
library(tidyverse)

# Convert variables to factors
mpg2 <- mpg %>%
    mutate(cyl = factor(cyl))

mpg2["auto"] <- "auto"
mpg2$auto[mpg2$trans %in% c("manual(m5)", "manual(m6)")] <- "manual"
mpg2$auto <- factor(mpg2$auto)

mpg2["fwd"] <- "2wd"
mpg2$fwd[mpg2$drv == "4"] <- "4wd"
mpg2$fwd <- factor(mpg2$fwd)

# Drop cars with 5 cylinders (since most have 4, 6, or 8)
mpg2 <- mpg2[mpg2$cyl != "5",]

# Fit the model
fit3 <- lm(cty ~ cyl * fwd * auto, data = mpg2)

library(jtools) # For summ()
summ(fit3)
Observations 230
Dependent variable cty
Type OLS linear regression
F(11,218) 61.37
0.76
Adj. R² 0.74
Est. S.E. t val. p
(Intercept) 21.37 0.39 54.19 0.00
cyl6 -4.37 0.54 -8.07 0.00
cyl8 -8.37 0.67 -12.51 0.00
fwd4wd -2.91 0.76 -3.83 0.00
automanual 1.45 0.57 2.56 0.01
cyl6:fwd4wd 0.59 0.96 0.62 0.54
cyl8:fwd4wd 2.13 0.99 2.15 0.03
cyl6:automanual -0.76 0.90 -0.84 0.40
cyl8:automanual 0.71 1.18 0.60 0.55
fwd4wd:automanual -1.66 1.07 -1.56 0.12
cyl6:fwd4wd:automanual 1.29 1.52 0.85 0.40
cyl8:fwd4wd:automanual -1.39 1.76 -0.79 0.43
Standard errors: OLS
cat_plot(fit3,
         pred = cyl,
         modx = fwd,
         plot.points = TRUE)
Scatter plot showing the relationship between the number of cylinders (cyl) on the x-axis and city fuel efficiency (cty) on the y-axis. Data points are color-coded by drivetrain type: blue for 2-wheel drive (2wd) and orange for 4-wheel drive (4wd). Each group of data points is accompanied by horizontal lines indicating the mean and standard deviation. The plot illustrates a general trend of decreasing city fuel efficiency with an increasing number of cylinders.

Figure 18.4: Scatter Plot

Line Plot for Categorical Interaction

cat_plot(
    fit3,
    pred = cyl,
    modx = fwd,
    geom = "line",
    point.shape = TRUE,
    vary.lty = TRUE
)
Line chart showing the relationship between the number of cylinders (cyl) and city fuel efficiency (cty) for two types of wheel drive: 2wd and 4wd. The x-axis represents the number of cylinders, while the y-axis shows city fuel efficiency. The blue solid line with circles represents 2wd, and the orange dashed line with triangles represents 4wd. Error bars indicate variability in the data. A legend on the right identifies the line styles for each wheel drive type.

Figure 18.5: Line Chart between cty and cyl

Bar Plot Representation

cat_plot(
    fit3,
    pred = cyl,
    modx = fwd,
    geom = "bar",
    interval = TRUE,
    plot.points = TRUE
)
Bar chart showing city mileage (cty) versus the number of cylinders (cyl) for vehicles with two-wheel drive (2wd) and four-wheel drive (4wd). Bars are grouped by cylinder count: 4, 6, and 8. Blue bars represent 2wd, and orange bars represent 4wd. Data points are overlaid as dots, indicating individual values. A legend on the right distinguishes between 2wd and 4wd.

Figure 18.6: Bar Chart between cty and cyl

18.7.4 interactionR Package

The interactionR package is designed for publication-quality reporting of interaction effects, particularly in epidemiology and social sciences. It provides tools for computing interaction measures, confidence intervals, and statistical inference following well-established methodologies.

Key Features:

install.packages("interactionR", dependencies = T)

18.7.5 sjPlot Package

The sjPlot package is highly recommended for publication-quality visualizations of interaction effects. It provides enhanced aesthetics and customizable interaction plots suitable for academic journals.

More details: sjPlot interaction visualization

18.7.6 Summary of Moderation Analysis Packages

Summary of Moderation Analysis Packages
Package Purpose Key Features Recommended?
emmeans Estimated marginal means & simple slopes Computes simple slopes, spotlight analysis, floodlight analysis (J-N method) Yes
probemod Johnson-Neyman technique Tests moderator significance ranges No (Subscript issues)
interactions Interaction visualization Produces robust, customizable interaction plots Yes
interactionR Epidemiological interaction measures Computes RERI, AP, SI for additive scale interactions Yes (for public health research)
sjPlot Publication-quality interaction plots Highly customizable, ideal for academic papers Highly Recommended