Chapter 1 Review of R Programming

Computers can’t magically process every task we want; they are just mechanical devices driven by electrical impulses. Their true ingenuity comes from people—engineers who build them and programmers who instruct them.

Before we learn the topics in computational statistics, an important question to ask is: how do computers process tasks and how can we write codes that implement those tasks which our computers can process?

This section refreshes essential R programming concepts, ensuring that readers have the necessary foundation for more advanced topics in the book.

Primitive Data Types

R has several fundamental types that form the building blocks for more complex objects.
  • Numeric (double by default): 3.14, 42

  • Integer: 42L

  • Character: "Hello"

  • Logical: TRUE, FALSE

  • Complex: 1 + 2i

    Example:

    class(3.14)
    class(42L)
    class("Hello")
    class(TRUE)
    class(1+2i)
    ## [1] "numeric"
    ## [1] "integer"
    ## [1] "character"
    ## [1] "logical"
    ## [1] "complex"

Operations

Operations manipulate data and produce results.

The following are examples of “infixed” operators

  • Arithmetic: +, -, *, /, ^, %%

    Input: Numeric/Integer/Logical Output: Numeric/Integer

    1 + 1
    1 - 1
    1 * 2
    8 / 2
    8 ** 2
    8 ^ 2
    9 %% 3
    10 %% 3
    9 %/% 3
    10 %/% 3
    ## [1] 2
    ## [1] 0
    ## [1] 2
    ## [1] 4
    ## [1] 64
    ## [1] 64
    ## [1] 0
    ## [1] 1
    ## [1] 3
    ## [1] 3
  • Logical: &, |, !, xor()

    Input: Logical/Numeric/Integer Output: Logical

    # A negation of a True is a False
    !T
    
    # A conjunction will only be true if both are trues
    T & T
    T & F
    
    # A logical or will be true if there is at least one true
    T | F
    F | F
    ## [1] FALSE
    ## [1] TRUE
    ## [1] FALSE
    ## [1] TRUE
    ## [1] FALSE
  • Relational: <, <=, >, >=, ==, !=

    Input: any data type Output: Logical

    1>1
    1<1
    1==1
    1<=1
    1>=1
    1!=1
    ## [1] FALSE
    ## [1] FALSE
    ## [1] TRUE
    ## [1] TRUE
    ## [1] TRUE
    ## [1] FALSE
    "ant" < "zebra"
    "A" > "a"
    ## [1] TRUE
    ## [1] TRUE

Nesting of Infix Operators

R follows the standard order operations

  • 2 + 3 * 4 + 5 evaluates 3*4 first.

  • T|T&F evaluates T&F first.

For complex calculations, we can use parentheses to control order of evaluation, ensuring the intended operations happen first.

Example: Arithmetic Nesting

(2 + 3) * 4 ^ (1/2)

is different from

2 + 3 * 4 ^ 1 / 2

Example: Logical and Relational Nesting

(x > y) & (y > 0 | y < -1)
  • evaluate relational operations first
  • evaluate the | (or) operation inside the parantheses
  • evaluate the & (and) operation after

Expressions

Expressions are evaluated before instructions are executed.


An expression is a combination of operators, constants and variables. An expression may consist of one or more operands, and zero or more operators to produce a value.

Below is a sample code.

x <- 2 + 4  # Line 1
y <- x - 1  # Line 2
print(2*x*y)    # Line 3
## [1] 60

In the first line, 2 + 4 is an expression. This will be evaluated first, before the whole instruction (which is the assignment) will be executed.

In the second line, x - 1 is evaluated before storing the value to y.

In the last line, 2*x*y is evaluated before the print instruction is executed.


Control Stuctures

Instructions are executed sequentially.


In most (if not all) programming languages, instructions are executed sequentially, from top to bottom. For example, if we have the following instructions, line 1 will be executed first, followed by line 2.

2 + 4   # Line 1
6 / 2   # Line 2
## [1] 6
## [1] 3

The first line will be executed first before the second line.

The following is another example where there are two assignment statements on the object x.

x <- 10     # Line 1
x <- 2 + 4  # Line 2
x/2         # Line 3

Our computers do not get confused what we mean by x in line 3, because all instructions are executed sequentially.

The first rule of programming we learned is the instructions are executed, from top to bottom. This doesn’t give us much power to do complex tasks. Previously, we also learned that we could use the return statement to alter that sequential execution.

We take this a step further with control structures.

If we want to skip some steps based on some conditions, we use selection control structures. These are the if-else-then statements that are very common in all programming languages.

On the other hand, if we want to execute commands repeatedly, we use repetition control stuctures. These are the for loop, while loop, until loop, and other structures that allows the computer to continuously execute commands while conditions are satisfied.


Environment

Functions as abstractions

Each time a function is called, not defined, a new environment is created.

  • first, it interrupts the sequential execution of instructions in the parent environment,

  • then proceeds with the execution of the instructions defined inside the function in a new environment.

What is “abstraction” in programming?

In object oriented programming, abstraction is a fundamental concept that simplifies complex systems by focusing on essential characteristics while concealing unnecessary details.

It is a way of hiding complicated details from the end user.

There is a term called “don’t repeat yourself” or “DRY” in programming, which suggests that it is a good programming practice to NOT use a block of code, or information, repeatedly.

What we want is to be able to “abstract” this information, i.e., define an object we can call multiple times without having to reveal ALL unnecessary information.

Creating “functions” helps us in abstraction.

Function in R

Note that a function is defined in R with the following syntax:

<NAME OF FUNCTION> <- function(<ARGUMENT 1>, <ARGUMENT 2>, ...){
  <FIRST INSTRUCTION OF THE FUNCTION>
  <SECOND INSTRUCTION OF THE FUNCTION>
  ...
}

This means that the class of <NAME OF FUNCTION> is a function, regardless of the instructions inside it.

More details about this in Section ?? .

Questions

  1. What is the value of x in the global environment after this sequence of codes?

    x <- 10
    update_x <-function(x){
        x <- 20
    }
    update_x(x)
  2. What will be bound to a, b, and x in the global environment after the function was called?

    x <- 3
    a <- 5
    func <- function(b){
        x <- a + b
            return(x)
    
    }
    
    func(2)

Data Structures in R

Data structures allow us to represent individual objects as a single entity.

Suppose we have the following dataset.

y assignedSex age
3 F 30
2 M 28
7 F 35

Each cell value can be assigned to some object.

y1 <- 3
y2 <- 2
y3 <- 7
assignedSex1 <- "F"
assignedSex2 <- "M"
assignedSex3 <- "F"
age1 <- 30
age2 <- 28
age3 <- 35

Data structures allow us to make sense of different values that have relationships with each other.

data <- data.frame(y = c(3, 2, 7),
           assignedSex = c("F", "M", "F"),
           age = c(30, 28, 35))
print(data)
##   y assignedSex age
## 1 3           F  30
## 2 2           M  28
## 3 7           F  35



The following are examples of data structures in R:

1.0.1 Atomic vector

the most fundamental data structure in R. It is a single dimension structure which can only contain a single type of data (e.g., only numbers or only character strings). By default, everything we create in R are atomic vectors.


``` r
my_log <- c(TRUE, FALSE, T, F, NA)
my_int <- c(1L, 2L, 3L, 4L, NA)
my_dbl <- c(1.25, 1.50, 1.75, 2.00, NA)
my_chr <- c("a", "b", "c", "d", NA)
```


``` r
my_log
my_int
my_dbl
my_chr
```

```
## [1]  TRUE FALSE  TRUE FALSE    NA
## [1]  1  2  3  4 NA
## [1] 1.25 1.50 1.75 2.00   NA
## [1] "a" "b" "c" "d" NA
```

Matrix

An extension of vector data in two dimensions


``` r
mat <- matrix(data = 1:9, nrow = 3, ncol = 3)
print(mat)
```

```
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
```

Array

an extension of the vector data structure to n-dimensions.


``` r
# example: 3 dimensional array
# 3 rows, 4 columns, 2 layers
arr <- array(1:24, dim = c(3, 4, 2))  
print(arr)
```

```
## , , 1
## 
##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4]
## [1,]   13   16   19   22
## [2,]   14   17   20   23
## [3,]   15   18   21   24
```

List

like vectors but they allow different kinds of object per element


``` r
my_list <- list("hello world", my_dbl, mat, arr)
print(my_list)
```

```
## [[1]]
## [1] "hello world"
## 
## [[2]]
## [1] 1.25 1.50 1.75 2.00   NA
## 
## [[3]]
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
## 
## [[4]]
## , , 1
## 
##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4]
## [1,]   13   16   19   22
## [2,]   14   17   20   23
## [3,]   15   18   21   24
```

Data frame

a special type of list, containing atomic vectors with the same length.


``` r
names  <- c("Aris", "Bertio", "Condrado", "Dionisia", "Encinas")
age    <- c(22L, 20L, 21L, 18L, 19L)
grade  <- c(3.00, 2.25, 2.50, 1.00, 1.50)
class_data <- data.frame(names, age, grade)
```

The following dataframe has 3 columns (variables) and 5 rows (observations).


``` r
class_data
```

<div data-pagedtable="false">
  <script data-pagedtable-source type="application/json">
{"columns":[{"label":["names"],"name":[1],"type":["chr"],"align":["left"]},{"label":["age"],"name":[2],"type":["int"],"align":["right"]},{"label":["grade"],"name":[3],"type":["dbl"],"align":["right"]}],"data":[{"1":"Aris","2":"22","3":"3.00"},{"1":"Bertio","2":"20","3":"2.25"},{"1":"Condrado","2":"21","3":"2.50"},{"1":"Dionisia","2":"18","3":"1.00"},{"1":"Encinas","2":"19","3":"1.50"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>

In R, we are lucky to have several—including data frames, which are really suitable when working with data sets. The base data structures of R can be categorized by the number of their dimensions, and whether they’re homogeneous or heterogeneous.

Data Structure Dimensions? Heterogeneous?
Atomic vector 1 No
List 1 Yes
Matrix 2 No
Data frame 2 Yes
Array n No

The term data structure is actually a misnomer, because they can hold, not only data, but also other objects—like functions, or other collections of objects.

simple_model <- lm(y ~ x, data=data.frame(y = c(1, 3, 5, 3), x = c(1, 2, 3, 4)))
summary(simple_model)
## 
## Call:
## lm(formula = y ~ x, data = data.frame(y = c(1, 3, 5, 3), x = c(1, 
##     2, 3, 4)))
## 
## Residuals:
##    1    2    3    4 
## -0.8  0.4  1.6 -1.2 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   1.0000     1.8974   0.527    0.651
## x             0.8000     0.6928   1.155    0.368
## 
## Residual standard error: 1.549 on 2 degrees of freedom
## Multiple R-squared:    0.4,  Adjusted R-squared:    0.1 
## F-statistic: 1.333 on 1 and 2 DF,  p-value: 0.3675

Usually, when complex objects are represented on our computers, e.g., regression models, a combination of different kinds of objects—not only data or values—are combined.

1.1 *Exercise() {-}

  1. In your devices, open R Studio and create an R project. This shall be your working environment.

  2. Explore how to work with R Markdown.

    Visit this link to know more about text formatting and other capabilities of R Markdown:

    https://rmarkdown.rstudio.com/authoring_basics.html

Note that for our machine problems, I will be requiring you to use R Markdown for easier documentation. Results must be knitted to PDF.


© 2025 Siegfred Roi L. Codia. All rights reserved.