Chapter 5 Vectors & Recycling
What You’ll Learn:
- How R’s vector recycling works
- When recycling helps and when it hurts
- Length mismatch errors
- Replacement length errors
- Vectorization best practices
Key Errors Covered: 15+ recycling and length errors
Difficulty: ⭐ Beginner to ⭐⭐ Intermediate
5.1 Introduction
R’s superpower is vectorization - operations work on entire vectors at once. But with this power comes a quirky feature called recycling that causes endless confusion.
But what about this?
It works! But is this what you wanted? Let’s explore when recycling helps and when it causes errors.
5.2 Understanding Recycling
💡 Key Insight: The Recycling Rule
When vectors of different lengths are used together, R repeats the shorter one to match the longer one.
# What happens:
c(1, 2, 3, 4) + c(10, 20)
#> [1] 11 22 13 24
# R expands to:
c(1, 2, 3, 4) + c(10, 20, 10, 20)
#> [1] 11 22 13 24
# ↑ ↑ recycled!Works smoothly when: - One vector is length 1 (scalar) - Lengths are multiples (2 and 4, 3 and 6)
Warns when: - Lengths aren’t multiples (3 and 5)
Errors when: - Replacement context and lengths don’t match
5.3 Error #1: longer object length is not a multiple
⭐ BEGINNER 📏 LENGTH
5.3.1 The Warning
c(1, 2, 3) + c(10, 20, 30, 40, 50)
#> Warning in c(1, 2, 3) + c(10, 20, 30, 40, 50): longer object length is not a
#> multiple of shorter object length
#> [1] 11 22 33 41 52🟡 WARNING
Warning message:
In c(1, 2, 3) + c(10, 20, 30, 40, 50) :
longer object length is not a multiple of shorter object length
5.3.2 What It Means
R is recycling, but the lengths don’t match evenly. This usually indicates a mistake.
5.3.3 Common Causes
5.3.3.1 Cause 1: Data Mismatch
# You have 100 observations
data <- rnorm(100)
# But only 3 group labels
groups <- c("A", "B", "C")
# Recycling happens
combined <- data.frame(value = data, group = groups)
#> Error in data.frame(value = data, group = groups): arguments imply differing number of rows: 100, 3The warning tells you: “Hey, are you sure about this?”
5.3.3.2 Cause 2: Filtering Gone Wrong
x <- 1:10
y <- 1:7 # Oops, lost some values
# Operations warn
x + y
#> Warning in x + y: longer object length is not a multiple of shorter object
#> length
#> [1] 2 4 6 8 10 12 14 9 11 13
x * y
#> Warning in x * y: longer object length is not a multiple of shorter object
#> length
#> [1] 1 4 9 16 25 36 49 8 18 305.3.3.3 Cause 3: Unintended Partial Matching
treatment <- c("Drug", "Placebo")
outcomes <- rnorm(25) # 25 subjects
# Assigning treatment to outcomes
data.frame(outcome = outcomes, treatment = treatment)
#> Error in data.frame(outcome = outcomes, treatment = treatment): arguments imply differing number of rows: 25, 2Warning: 25 is not a multiple of 2!
5.3.4 Solutions
✅ SOLUTION 1: Fix the Lengths
# Original problem
x <- 1:10
y <- 1:7
# Option A: Trim to match
min_len <- min(length(x), length(y))
x[1:min_len] + y[1:min_len]
#> [1] 2 4 6 8 10 12 14
# Option B: Extend with NA
y_extended <- c(y, rep(NA, length(x) - length(y)))
x + y_extended
#> [1] 2 4 6 8 10 12 14 NA NA NA
# Option C: Explicit recycling (if intentional)
y_recycled <- rep(y, length.out = length(x))
x + y_recycled
#> [1] 2 4 6 8 10 12 14 9 11 13✅ SOLUTION 2: Check Lengths Before Operating
safe_operation <- function(x, y, op = `+`) {
if (length(x) != length(y)) {
# Check if one is length 1 (scalar - OK)
if (length(x) == 1 || length(y) == 1) {
return(op(x, y))
}
# Check if lengths are multiples
if (max(length(x), length(y)) %% min(length(x), length(y)) != 0) {
warning("Lengths are not multiples: ",
length(x), " and ", length(y))
}
}
return(op(x, y))
}
# Test
safe_operation(1:10, 1:7, `+`) # Warns
#> Warning in safe_operation(1:10, 1:7, `+`): Lengths are not multiples: 10 and 7
#> Warning in op(x, y): longer object length is not a multiple of shorter object
#> length
#> [1] 2 4 6 8 10 12 14 9 11 13
safe_operation(1:10, 1:5, `+`) # No warning (10/5 = 2)
#> [1] 2 4 6 8 10 7 9 11 13 15
safe_operation(1:10, 2, `+`) # No warning (scalar)
#> [1] 3 4 5 6 7 8 9 10 11 12✅ SOLUTION 3: Use rep() Explicitly
⚠️ Common Pitfall: Silent Recycling with Multiples
# No warning when lengths are multiples!
x <- 1:6
y <- c(10, 20, 30) # 6 is multiple of 3
result <- x + y
result
#> [1] 11 22 33 14 25 36
# R expanded y to: c(10, 20, 30, 10, 20, 30)
# Was this intended?Always check: Just because it doesn’t warn doesn’t mean it’s correct!
5.4 Error #2: replacement has X rows, data has Y
⭐⭐ INTERMEDIATE 📏 LENGTH
5.4.1 The Error
df <- data.frame(x = 1:5, y = 6:10)
df$z <- 1:3 # Wrong length!
#> Error in `$<-.data.frame`(`*tmp*`, z, value = 1:3): replacement has 3 rows, data has 5🔴 ERROR
Error in `$<-.data.frame`(`*tmp*`, z, value = 1:3) :
replacement has 3 rows, data has 5
5.4.2 What It Means
You’re trying to add/replace a column, but the number of values doesn’t match the number of rows.
5.4.3 Common Causes
5.4.3.2 Cause 2: Filtered Data Reassignment
df <- data.frame(x = 1:10, y = rnorm(10))
# Filter
subset_df <- df[df$y > 0, ] # Maybe 6 rows
# Create column for subset
new_values <- 1:6
# Try to add to original
df$new <- new_values # Error! Original has 10 rows
#> Error in `$<-.data.frame`(`*tmp*`, new, value = 1:6): replacement has 6 rows, data has 105.4.4 Solutions
✅ SOLUTION 1: Match the Length
✅ SOLUTION 2: Use Merge/Join for Aggregates
# Original data
df <- data.frame(
id = 1:20,
group = rep(c("A", "B"), each = 10),
value = rnorm(20)
)
# Aggregate
group_summary <- aggregate(value ~ group, df, mean)
names(group_summary)[2] <- "group_mean"
# Merge back
df <- merge(df, group_summary, by = "group")
head(df)
#> group id value group_mean
#> 1 A 1 -0.7212893 -0.5035515
#> 2 A 2 -0.3361355 -0.5035515
#> 3 A 3 -0.5519150 -0.5035515
#> 4 A 4 0.1108687 -0.5035515
#> 5 A 5 0.5672052 -0.5035515
#> 6 A 6 -2.0882567 -0.5035515✅ SOLUTION 3: dplyr Way (Cleaner)
library(dplyr)
df <- data.frame(
id = 1:20,
group = rep(c("A", "B"), each = 10),
value = rnorm(20)
)
# Add group mean to each row
df <- df %>%
group_by(group) %>%
mutate(group_mean = mean(value)) %>%
ungroup()
head(df)
#> # A tibble: 6 × 4
#> id group value group_mean
#> <int> <chr> <dbl> <dbl>
#> 1 1 A -0.248 -0.782
#> 2 2 A -1.84 -0.782
#> 3 3 A -0.314 -0.782
#> 4 4 A -0.769 -0.782
#> 5 5 A -0.802 -0.782
#> 6 6 A -0.512 -0.7825.5 Error #3: number of items to replace is not a multiple
⭐⭐ INTERMEDIATE 📏 LENGTH
5.5.1 The Error
x <- 1:10
x[1:7] <- c(100, 200) # 7 positions, 2 values
#> Warning in x[1:7] <- c(100, 200): number of items to replace is not a multiple
#> of replacement length🔴 ERROR
Error in x[1:7] <- c(100, 200) :
number of items to replace is not a multiple of replacement length
5.5.3 When This Happens
# Replacing 10 items with 3 values
x <- 1:10
x[] <- c(1, 2, 3) # 10 is not a multiple of 3
#> Warning in x[] <- c(1, 2, 3): number of items to replace is not a multiple of
#> replacement length
# Replacing 7 items with 2 values
x[1:7] <- c(10, 20) # 7 is not a multiple of 2
#> Warning in x[1:7] <- c(10, 20): number of items to replace is not a multiple of
#> replacement lengthBut these work:
5.6 Error #4: replacement has length zero
⭐⭐ INTERMEDIATE 📏 LENGTH
5.7 Vectorization Best Practices
🎯 Best Practice: Length-Safe Operations
# 1. Check lengths match
operate_safely <- function(x, y, fun) {
if (length(x) != length(y)) {
stop("Vectors must be same length. Got ",
length(x), " and ", length(y))
}
fun(x, y)
}
# 2. Use recycling intentionally (scalars only)
add_scalar <- function(vec, scalar) {
stopifnot(length(scalar) == 1)
vec + scalar
}
# 3. Document recycling behavior
#' Add vectors with explicit recycling
#' @param x numeric vector
#' @param y numeric vector (will be recycled to length of x)
add_with_recycling <- function(x, y) {
if (length(y) == 1) {
return(x + y) # Scalar - always OK
}
y_recycled <- rep(y, length.out = length(x))
return(x + y_recycled)
}5.8 Understanding Vector Operations
💡 Key Insight: What Gets Recycled
# Arithmetic operators
1:4 + c(10, 20) # Addition
#> [1] 11 22 13 24
1:4 - c(10, 20) # Subtraction
#> [1] -9 -18 -7 -16
1:4 * c(2, 3) # Multiplication
#> [1] 2 6 6 12
1:4 / c(2, 4) # Division
#> [1] 0.5 0.5 1.5 1.0
# Logical operators
c(TRUE, FALSE) & c(TRUE, TRUE, FALSE, FALSE)
#> [1] TRUE FALSE FALSE FALSE
c(TRUE, FALSE) | c(FALSE, FALSE, TRUE, TRUE)
#> [1] TRUE FALSE TRUE TRUE
# Comparison operators
1:6 > c(2, 4, 6) # Recycles both
#> [1] FALSE FALSE FALSE TRUE TRUE FALSE
# Assignment
x <- 1:12
x[] <- c(1, 2, 3) # Recycles to 12
x
#> [1] 1 2 3 1 2 3 1 2 3 1 2 3Key point: Recycling happens in MANY contexts!
5.9 Edge Cases and Gotchas
5.9.1 Gotcha #1: Matrix Recycling
# Matrices recycle by column!
matrix(1:2, nrow = 3, ncol = 4)
#> [,1] [,2] [,3] [,4]
#> [1,] 1 2 1 2
#> [2,] 2 1 2 1
#> [3,] 1 2 1 2Warning appears because 12 (3×4) is not multiple of 2.
5.9.2 Gotcha #2: Data Frame Column Recycling
# This works - length 1 always recycles
df <- data.frame(
x = 1:5,
y = 10 # Recycled to 5
)
df
#> x y
#> 1 1 10
#> 2 2 10
#> 3 3 10
#> 4 4 10
#> 5 5 10
# This works - multiple lengths
df <- data.frame(
x = 1:6,
y = c(1, 2) # Recycled to 6
)
df
#> x y
#> 1 1 1
#> 2 2 2
#> 3 3 1
#> 4 4 2
#> 5 5 1
#> 6 6 25.9.3 Gotcha #3: List Operations Don’t Recycle
# Vectors recycle
c(1, 2) + c(10, 20, 30) # Works (with warning)
#> Warning in c(1, 2) + c(10, 20, 30): longer object length is not a multiple of
#> shorter object length
#> [1] 11 22 31
# Lists don't
list(1, 2) + list(10, 20, 30) # Error!
#> Error in list(1, 2) + list(10, 20, 30): non-numeric argument to binary operatorLists need explicit handling:
5.10 Debugging Recycling Issues
💡 Debugging Checklist
# 1. Check lengths
x <- 1:10
y <- 1:7
length(x)
#> [1] 10
length(y)
#> [1] 7
# 2. Check if they're multiples
max(length(x), length(y)) %% min(length(x), length(y))
#> [1] 3
# 0 = clean multiple, anything else = partial recycling
# 3. Visualize recycling
rep(y, length.out = length(x))
#> [1] 1 2 3 4 5 6 7 1 2 3
# 4. Test operation
tryCatch(
x + y,
warning = function(w) {
message("Warning caught: ", w$message)
}
)
#> Warning caught: longer object length is not a multiple of shorter object length
# 5. Check for unexpected conversions
class(x); typeof(x)
#> [1] "integer"
#> [1] "integer"
class(y); typeof(y)
#> [1] "integer"
#> [1] "integer"5.11 Summary
Key Takeaways:
- Recycling is automatic: R repeats shorter vectors to match longer ones
- Warnings appear: When lengths aren’t multiples (except scalars)
- Scalars always work: Length 1 recycles to any length
- Check before operating: Use
length()to verify matches - Explicit is better: Use
rep()to show intent - Data frames are strict: Column lengths must match (or be length 1)
- Errors vs warnings: Replacement operations error, arithmetic operations warn
Quick Reference:
| Situation | Behavior |
|---|---|
| Same length | No recycling needed |
| One is length 1 | Silent recycling (scalar) |
| Lengths are multiples | Silent recycling (e.g., 2 and 6) |
| Lengths not multiples | Warning + recycling (e.g., 3 and 7) |
| Replacement, wrong length | Error (not multiples) |
| Replacement, length 0 | Error |
| Data frame column | Error if not length 1 or nrow |
Prevention:
# Always check
stopifnot(length(x) == length(y))
# Or use scalars only
stopifnot(length(y) == 1)
# Or recycle explicitly
y <- rep(y, length.out = length(x))Remember: No warning doesn’t mean correct! Multiples recycle silently.
5.12 Exercises
📝 Exercise 1: Predict the Outcome
What will happen? Will it work, warn, or error?
📝 Exercise 2: Fix the Code
Debug these recycling problems:
# Problem 1
students <- 1:25
groups <- c("A", "B", "C")
data.frame(student = students, group = groups)
# Problem 2
values <- rnorm(100)
weights <- c(1, 2, 3)
weighted <- values * weights
# Problem 3
df <- data.frame(id = 1:20)
summary_stats <- c(mean = 50, sd = 10, n = 20)
df$mean <- summary_stats["mean"]📝 Exercise 3: Safe Operations
Write a function safe_add(x, y) that:
1. Checks if lengths match
2. If not, asks user what to do:
- Error
- Recycle shorter
- Trim longer
- Extend with NA
3. Performs the operation
4. Returns result with attribute showing what was done
📝 Exercise 4: Real World
You have exam scores for 100 students across 4 quarters:
scores_q1 <- rnorm(100, mean = 75, sd = 10)
scores_q2 <- rnorm(98, mean = 78, sd = 10) # 2 students dropped
scores_q3 <- rnorm(102, mean = 80, sd = 10) # 2 new students
scores_q4 <- rnorm(100, mean = 82, sd = 10)Create a data frame with: - All students who completed at least one quarter - NA for missing scores - Calculate average score per student
5.13 Exercise Answers
Click to see answers
Exercise 1:
# A - Works, silent (4 is multiple of 2)
c(1, 2, 3, 4) + c(10, 20)
#> [1] 11 22 13 24
# B - Works, warns (5 not multiple of 2)
c(1, 2, 3, 4, 5) + c(10, 20)
#> Warning in c(1, 2, 3, 4, 5) + c(10, 20): longer object length is not a multiple
#> of shorter object length
#> [1] 11 22 13 24 15
# C - Errors (10 not multiple of 5)
tryCatch(
data.frame(x = 1:10, y = c(1, 2, 3, 4, 5)),
error = function(e) message("Error: ", e$message)
)
#> x y
#> 1 1 1
#> 2 2 2
#> 3 3 3
#> 4 4 4
#> 5 5 5
#> 6 6 1
#> 7 7 2
#> 8 8 3
#> 9 9 4
#> 10 10 5
# D - Works, silent (12 is multiple of 4)
x <- 1:12
x[] <- c(1, 2, 3, 4)
x
#> [1] 1 2 3 4 1 2 3 4 1 2 3 4
# E - Works, silent (25 is multiple of 5)
matrix(1:5, nrow = 5, ncol = 5)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 1 1 1 1
#> [2,] 2 2 2 2 2
#> [3,] 3 3 3 3 3
#> [4,] 4 4 4 4 4
#> [5,] 5 5 5 5 5Exercise 2:
# Problem 1 - Recycle groups explicitly
students <- 1:25
groups <- c("A", "B", "C")
data.frame(
student = students,
group = rep(groups, length.out = length(students))
)
#> student group
#> 1 1 A
#> 2 2 B
#> 3 3 C
#> 4 4 A
#> 5 5 B
#> 6 6 C
#> 7 7 A
#> 8 8 B
#> 9 9 C
#> 10 10 A
#> 11 11 B
#> 12 12 C
#> 13 13 A
#> 14 14 B
#> 15 15 C
#> 16 16 A
#> 17 17 B
#> 18 18 C
#> 19 19 A
#> 20 20 B
#> 21 21 C
#> 22 22 A
#> 23 23 B
#> 24 24 C
#> 25 25 A
# Problem 2 - Make intention clear
values <- rnorm(100)
weights <- c(1, 2, 3)
weights_full <- rep(weights, length.out = length(values))
weighted <- values * weights_full
# Problem 3 - Extract scalar properly
df <- data.frame(id = 1:20)
summary_stats <- c(mean = 50, sd = 10, n = 20)
df$mean <- summary_stats[["mean"]] # Single valueExercise 3:
safe_add <- function(x, y, action = c("error", "recycle", "trim", "extend")) {
action <- match.arg(action)
if (length(x) == length(y)) {
result <- x + y
attr(result, "action") <- "none_needed"
return(result)
}
if (action == "error") {
stop("Lengths don't match: ", length(x), " vs ", length(y))
}
if (action == "recycle") {
max_len <- max(length(x), length(y))
x <- rep(x, length.out = max_len)
y <- rep(y, length.out = max_len)
result <- x + y
attr(result, "action") <- "recycled"
}
if (action == "trim") {
min_len <- min(length(x), length(y))
result <- x[1:min_len] + y[1:min_len]
attr(result, "action") <- "trimmed"
}
if (action == "extend") {
max_len <- max(length(x), length(y))
x <- c(x, rep(NA, max_len - length(x)))
y <- c(y, rep(NA, max_len - length(y)))
result <- x + y
attr(result, "action") <- "extended"
}
return(result)
}
# Test
safe_add(1:5, 1:3, "recycle")
#> [1] 2 4 6 5 7
#> attr(,"action")
#> [1] "recycled"Exercise 4:
# Create scores with different lengths
set.seed(123)
scores_q1 <- rnorm(100, mean = 75, sd = 10)
scores_q2 <- rnorm(98, mean = 78, sd = 10)
scores_q3 <- rnorm(102, mean = 80, sd = 10)
scores_q4 <- rnorm(100, mean = 82, sd = 10)
# Find max number of students
max_students <- max(length(scores_q1), length(scores_q2),
length(scores_q3), length(scores_q4))
# Extend all to max length with NA
extend_with_na <- function(x, target_len) {
c(x, rep(NA, target_len - length(x)))
}
# Create data frame
df <- data.frame(
student_id = 1:max_students,
q1 = extend_with_na(scores_q1, max_students),
q2 = extend_with_na(scores_q2, max_students),
q3 = extend_with_na(scores_q3, max_students),
q4 = extend_with_na(scores_q4, max_students)
)
# Calculate average (ignoring NAs)
df$average <- rowMeans(df[, c("q1", "q2", "q3", "q4")], na.rm = TRUE)
# Keep only students with at least one score
df <- df[!is.nan(df$average), ]
head(df)
#> student_id q1 q2 q3 q4 average
#> 1 1 69.39524 70.89593 73.88834 74.84758 72.25677
#> 2 2 72.69823 80.56884 68.14520 74.47311 73.97134
#> 3 3 90.58708 75.53308 101.98810 72.61461 85.18072
#> 4 4 75.70508 74.52457 93.12413 71.47487 78.70716
#> 5 5 76.29288 68.48381 77.34855 77.62840 74.93841
#> 6 6 92.15065 77.54972 85.43194 85.31179 85.11103