Chapter 9 Useful tips
Following the principles detailed above is guaranteed to make you a better programmer so we hope you will stick to them. To help you along the way just that tiny bit more, here are a few additional tips.
9.1 Break it down
Maybe you find yourself struggling with a task and turn to StackOverflow for help. Maybe you manage to find a ready-made answer to your problem or maybe some helpful soul writes one just for you. And maybe it works but you don’t quite understand why.
These things happen and sometimes a line of code can appear quite obscure. Take for example:
for (i in 1:10) eval(parse(text = paste0(
"df_", i, " <- as.data.frame(matrix(rnorm(100 * ", i ,", 0, ", i, "), ncol = i))")))
When faced with a bit of code like this, it is generally a good idea to try to reverse-engineer it. Let’s give it a go.
First, we can see that this is a for loop that repeats itself 10 times: it starts by assigning the value of 1 to the iterator object i, then executes the code, increments i by 1 and repeats until i == 10.
So to look what the code inside the loop does, we need to set i to some value (1 is a reasonable choice).
i <- 1
Now, let’s start by running the code from the innermost command outwards:
paste0("df_", i, " <- as.data.frame(matrix(rnorm(100 * ", i ,", 0, ", i, "), ncol = i))")
## [1] "df_1 <- as.data.frame(matrix(rnorm(100 * 1, 0, 1), ncol = i))"
Right, so the first command created a character string that looks like a command. Let’s break it down even further:
rnorm(100 * 1, 0, 1)
## [1] -1.047907263 0.334647380 0.435923495 -0.480989597 1.702518182
## [6] -0.370730068 1.052210386 1.108929130 0.018052376 1.279100669
## [11] -1.504444115 0.424034485 -1.204517636 -0.021164629 1.211853449
## [16] 0.332288148 1.078427383 1.025987337 0.172459867 0.795208031
## [21] 0.440109993 -0.827218657 0.596764479 0.745033312 -3.025576114
## [26] -1.762091803 0.413914646 0.845674843 -0.694720934 -0.308303527
## [31] 1.195282698 -0.911274774 0.115036852 0.539872015 2.341801712
## [36] -0.190848300 0.461041703 -0.184094516 2.208509955 1.044633826
## [41] -0.786850302 -0.054910095 0.252311968 1.319205985 -0.364450048
## [46] 1.001338090 -1.540652875 -0.669952228 0.355237452 0.245582746
## [51] 0.815361338 -1.470188139 0.922409071 1.359885235 0.047637818
## [56] -0.139715380 0.536770964 0.714087608 -2.007776828 1.353881648
## [61] 1.388877526 1.640218197 -0.962296390 -1.649831027 -0.336591424
## [66] -0.524440172 -1.461580211 1.786026822 -0.081735195 0.550506575
## [71] -0.790132175 -0.778848364 -1.538555873 0.486646149 -1.480791963
## [76] 1.274834396 -0.381704736 -1.392629198 1.840397983 1.874917568
## [81] 0.445369658 -0.485186904 1.255907351 1.095966740 -0.054362077
## [86] 0.064135598 -0.832767137 1.154166817 -0.009764447 -1.420798097
## [91] -0.666841665 -0.260189689 -1.241116990 -0.312476256 0.585133232
## [96] 0.220116142 0.321966905 -0.125602859 0.144848682 -1.150526461
OK, this is easy. The first bit generates \(100 \times i\) random numbers with a mean of zero and a standard deviation of i. Let’s move one layer out:
# printout truncated to first 10 lines
matrix(rnorm(100 * 1, 0, 1), ncol = i)
## [,1]
## [1,] 0.03877248
## [2,] -0.69695493
## [3,] 0.55118318
## [4,] -0.40594141
## [5,] -0.32389321
## [6,] -0.99335789
## [7,] -1.93083243
## [8,] -0.85998697
## [9,] 1.77928830
## [10,] 1.37858208
## [ reached getOption("max.print") -- omitted 90 rows ]
This command put those numbers into a matrix with 100 rows and i columns. Next:
df_1 <- as.data.frame(matrix(rnorm(100 * 1, 0, 1), ncol = i))
This line converts the matrix into a data.frame and stored it in an object called “df_i”. Remember, i takes values of 1-10, increasingly each time the loop is repeated.
All good thus far but why is the command a character string (in “quotes”)? What is that good for? Well, turns out that the parse() function can take a string with a valid R code inside and turn it to an expression:
parse(text = paste0("df_", i, " <- as.data.frame(matrix(rnorm(100 * ",
i ,", 0, ", i, "), ncol = i))"))
## expression(df_1 <- as.data.frame(matrix(rnorm(100 * 1, 0, 1), ncol = i)))
This expression can be then evaluated using the eval() function:
eval(parse(text = paste0(
"df_", i, " <- as.data.frame(matrix(rnorm(100 * ", i , ", 0, ", i, "), ncol = i))")))
# printout truncated
df_1
## V1
## 1 -0.2964653
## 2 -1.1495345
## 3 1.4206396
## 4 0.2983326
## 5 1.8263633
## 6 -0.8628447
## 7 1.1240684
## 8 -2.1283718
## 9 -0.2896466
## 10 -0.5050959
## [ reached 'max' / getOption("max.print") -- omitted 90 rows ]
So what the entire loop does is create 10 data frames named df_1 to df_10, each containing 100 rows and a different number of columns (1 for df_1, 6 for df_6 etc.) with random numbers. Moreover, each data.frame contains random numbers with different standard deviations.
And so, just like that, with a single line of code we can create 10 (or more!) different R objects with different properties. Cool, isn’t it? Hope this example demonstrates how, using systematic reverse-engineering, you can come to understand even a daunting-looking code with functions you haven’t seen before.
9.2 Handy functions that return logicals
Finally, here are some useful functions with which you might want to familiarise yourself. They will make cleaning your data much easier.
==, takes avector,matrix, or adata.frameand compares every element thereof to a single value. Returns alogical vectorwithTRUEfor elements that are equal to the compared value andFALSEotherwise. ComparingNAreturnsNA.c(1:5, NA) == c(100, 2, 2, 8, 5, 9)## [1] FALSE TRUE FALSE FALSE TRUE NA<, same as==, butTRUEis returned if element is less than the compared value.>, same as==, butTRUEis returned if element is greater than the compared value.<=, same as==, butTRUEis returned if element is less than or equal to the compared value. In other words, it is a negation of (complementary operation to)>.>=, same as==, butTRUEis returned if element is greater than or equal to the compared value. Negation of<.%in%, same as==, but can take avectoron the right hand side. Each element of thevector/matrix/data.frameto the left is compared to each element of the vector to the right. For example:c(1:5, NA) %in% c(100, 4, 2, 8)## [1] FALSE TRUE FALSE TRUE FALSE FALSE- all functions that begin with ‘
is’, e.g.:is.na(), takes avector,matrix, or adata.frameand returns alogical vectorwithTRUEif given element is anNAandFALSEotherwise.is.numeric(), takes any object and returnsTRUEif it is a numeric vector andFALSEotherwise.is.factor(),is.matrix(),is.data.frame(),is.list(), same asis.numeric()but returnTRUEif the object provided is afactor,matrix,data.frame, orlist, respectively.isTRUE(), returns a singleTRUEif the expression provided evaluates toTRUEand a singleFALSEotherwise. OnlyisTRUE(TRUE)returnsTRUE.isTRUE(FALSE),isTRUE(c(TRUE, TRUE))and anything else returnsFALSE. Works withNAs so can be useful for combining with logical operators that returnNAwhen comparing missing values. For example
NA > 4## [1] NAisTRUE(NA > 4)## [1] FALSE any(), takes a logical vector and returnsTRUEif any of its elements equalsTRUE, andFALSEotherwise, e.g.,any(1:5> 4)returnsTRUE.all(), likeany()but returnsTRUEonly if all of the elements of the vector provided areTRUE.all.equal(), takes two objects and returnsTRUEif they are identical and a vector of all discrepancies otherwise. Sensitive to attributes soall.equal(1:5, factor(1:5))does not returnTRUE. Good to use along withisTRUE()!all.equal(df, df)## [1] TRUEall.equal(df, my_list)## [1] "Names: 3 string mismatches" ## [2] "Attributes: < names for target but not for current >" ## [3] "Attributes: < Length mismatch: comparison on first 0 components >" ## [4] "Length mismatch: comparison on first 3 components" ## [5] "Component 1: Lengths: 5, 20" ## [6] "Component 1: Attributes: < target is NULL, current is list >" ## [7] "Component 1: target is numeric, current is matrix" ## [8] "Component 2: Modes: numeric, character" ## [9] "Component 2: target is numeric, current is character" ## [10] "Component 3: Modes: numeric, list" ## [ reached getOption("max.print") -- omitted 4 entries ]# use with isTRUE() if T/F desired isTRUE(all.equal(1:5, factor(1:5)))## [1] FALSE&, “AND” takes two Booleans and returnsTRUEif both of them areTRUE,NAif either isNA, andFALSEotherwise. Can be applied over twological vectorsof the same length:c(T, T, F) & c(T, T, T)## [1] TRUE TRUE FALSE|, “OR” is the same as&but returnsTRUEif either or both of the two compared elements isTRUE.xor(), “exclusive OR” is same as above but returnsTRUEonly if either the first or the second, but not both of the two compared elements, isTRUE.xor(c(T, F, F), c(T, F, T))## [1] FALSE FALSE TRUE&&and||, single-element versions of&and|. They only compare the first element of both of the vectors provided (i.e., x[1] vs y[1]):c(T, F, F) || c(T, F, T)## [1] TRUE- all of the above can be negated using the ‘
!’ operator, e.g.:x != y!x > y!is.na(x)!any(is.na(x))is equivalent toall(!is.na(x))!(x & y)is equivalent toxor(x, y) | (!x & !y)