MTH 209 Data Manipulation and Management

Lesson 3: Basic Data Structures (Vectors, Matrices, and Arrays)

Ying-Ju Tessa Chen
ychen4@udayton.edu
University of Dayton

Basic Data Structures

In this lesson, we will introduce the following data structures in R.

Note: This lesson is based on the book: The Art of R Programming (Matloff (2011)).

Vectors - 1

The vector is the fundamental data type in R. In this session, We will focus on the first two topics below. The third topic will be introduced later this semester.

  1. Recycling: Automatic extension of vectors with certain settings
  2. Filtering: The extraction of subsets of vectors
  3. Vectorization: When applying a function on vectors element-by-element

Vectors - 2

x <- c(147, 85, 21, 99, 38, 47, 1, 27)
x <- c(x[2:5], 217, 304, x[6])
x
## [1]  85  21  99  38 217 304  47
x[-c(1,4,8)] # Use negative subscripts to exclude certain elements. 
## [1]  21  99 217 304  47
length(x)
## [1] 7
x[-length(x)]
## [1]  85  21  99  38 217 304

Vectors - 3

c(4, 1, 5) + c(3, 2, 4, 7, 1)
## [1]  7  3  9 11  2
c(1, 2, 3) * c(4, 1, 5)
## [1]  4  2 15
c(1, 3, 4) / c(4, 1, 5)
## [1] 0.25 3.00 0.80
c(1, 3, 4) %% c(4, 1, 5) # x mod y, it returns remainder
## [1] 1 0 4

Vectors - 4

7:19
##  [1]  7  8  9 10 11 12 13 14 15 16 17 18 19
15:6
##  [1] 15 14 13 12 11 10  9  8  7  6

Be aware of operator precedence issues.

1:10-1
##  [1] 0 1 2 3 4 5 6 7 8 9
1:(10-1)
## [1] 1 2 3 4 5 6 7 8 9

Vectors - 5

seq(from = 0.1, to = 1, by = 0.1)
##  [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
seq(from = 0.1, to = 1, length = 5)
## [1] 0.100 0.325 0.550 0.775 1.000
seq(1, 10)
##  [1]  1  2  3  4  5  6  7  8  9 10
seq(1, 10, 2)
## [1] 1 3 5 7 9

Note: The order of arguments in default is from, to, and by.

Vectors - 6

x <- rep(NA, 3)
x 
## [1] NA NA NA
rep(3, 10)
##  [1] 3 3 3 3 3 3 3 3 3 3
rep(c(1, 3, -2), 5)
##  [1]  1  3 -2  1  3 -2  1  3 -2  1  3 -2  1  3 -2
rep(c(1, 3, -2), each = 2)
## [1]  1  1  3  3 -2 -2
rep("Hello", 4)
## [1] "Hello" "Hello" "Hello" "Hello"

Vectors - 7

x <- c(68, 43, 86, 51, 88, 29, 61, 18, 22, 45)
any(x > 50)
## [1] TRUE
all(x < 75)
## [1] FALSE
x <- c(NA, x, NA, NA) # include 3 NA's in x
any(x < 10)
## [1] NA
any(x < 10, na.rm=T) # T means TRUE
## [1] FALSE

Matrices and Arrays - 1

Matrix is a two dimensional rectangular data structure in R. A matrix is a vector with two attributes: the number of rows and the number of columns.

Matrix row and column subscripts starts with 1. The matrix \(A\) below has \(m\) rows and \(n\) columns. We say that the dimension of \(A\) is \(m\times n\).

\[A = \left( \begin{array}{cccc} A[1,1] & A[1,2] & \ldots & A[1,n]\\ A[2,1] & A[2,2] & \ldots & A[2,n]\\ \vdots & \vdots & \ddots & \vdots\\ A[m,1] & A[m,2] & \ldots & A[m,n] \end{array} \right)\]

Matrices and Arrays - 2

X <- matrix(c(1, 3, 2, 4, 5, 8), nrow = 2)
X
##      [,1] [,2] [,3]
## [1,]    1    2    5
## [2,]    3    4    8
dim(X)
## [1] 2 3
nrow(X) # obtain the number of rows
## [1] 2
ncol(X) # obtain the number of columns
## [1] 3
matrix(c(1, 3, 2, 4, 5, 8), ncol=2, byrow = T)
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
## [3,]    5    8

Matrices and Arrays - 3

A <- matrix(c(1, 2, 3, 4), ncol=2)
A %*% A # matrix multiplication
##      [,1] [,2]
## [1,]    7   15
## [2,]   10   22
-2 * A
##      [,1] [,2]
## [1,]   -2   -6
## [2,]   -4   -8
A + A
##      [,1] [,2]
## [1,]    2    6
## [2,]    4    8

Matrices and Arrays - 4

set.seed(1000) # set a random seed
A <- matrix(sample(1:100, 12), ncol=3) # use 12 random numbers to create a 4 by 3 matrix.
A
##      [,1] [,2] [,3]
## [1,]   68   88   22
## [2,]   43   29   45
## [3,]   86   61   38
## [4,]   51   18   33
A[2:3,]
##      [,1] [,2] [,3]
## [1,]   43   29   45
## [2,]   86   61   38
A[, - c(1,3)] # remove the 1st and 3rd columns
## [1] 88 29 61 18

Matrices and Arrays - 5

We can also assign values to submatrices. The following code chunk shows an example based on A given on the previous slide.

A
##      [,1] [,2] [,3]
## [1,]   68   88   22
## [2,]   43   29   45
## [3,]   86   61   38
## [4,]   51   18   33
A[, 1:2] <- matrix(c(1, 2, 3, 4, 7, 5, 8, -3), ncol=2)
A
##      [,1] [,2] [,3]
## [1,]    1    7   22
## [2,]    2    5   45
## [3,]    3    8   38
## [4,]    4   -3   33

Matrices and Arrays - 6

A
##      [,1] [,2] [,3]
## [1,]    1    7   22
## [2,]    2    5   45
## [3,]    3    8   38
## [4,]    4   -3   33
A[A[,2] >= 6, ]
##      [,1] [,2] [,3]
## [1,]    1    7   22
## [2,]    3    8   38
A[which(A[,2] >= 6), ]
##      [,1] [,2] [,3]
## [1,]    1    7   22
## [2,]    3    8   38

A[,2] >= 6 finds rows in the 2nd column such that their values are at least 6 and returns a set of logical values. The which() function reports which indices are TRUE.

Matrices and Arrays - 7

We should note that the filtering criterion can be based on a variable different from the one to which the filtering will be applied. This could be very useful when we have more than one matrices (or data frames) and we want to use the conditions from one matrix to extract values in another matrix.

A
##      [,1] [,2] [,3]
## [1,]    1    7   22
## [2,]    2    5   45
## [3,]    3    8   38
## [4,]    4   -3   33
z <- c(7, 8, 20, 31)
A[z %% 2 == 1, ]
##      [,1] [,2] [,3]
## [1,]    1    7   22
## [2,]    4   -3   33

Here, z %% 2 == 1 checks each element of z for being an odd number and returns a set of logical values.

Since matrices are vectors, we can apply vector operations to them as well. The code chunk below shows an example.

which(A < 20) 
## [1] 1 2 3 4 5 6 7 8

Matrices and Arrays - 8

A <- A[2:3, -2]
one <- rep(1, 2)
cbind(A, one)
##           one
## [1,] 2 45   1
## [2,] 3 38   1
rbind(A, one)
##     [,1] [,2]
##        2   45
##        3   38
## one    1    1

Matrices and Arrays - 9

X 
##      [,1] [,2] [,3]
## [1,]    1    2    5
## [2,]    3    4    8
colnames(X) <- c("C1", "C2", "C3")
X
##      C1 C2 C3
## [1,]  1  2  5
## [2,]  3  4  8
rownames(X) <- c("R1", "R2")
X
##    C1 C2 C3
## R1  1  2  5
## R2  3  4  8

Matrices and Arrays - 10

# Create two matrices.
set.seed(1000)
matrix1 <- matrix(sample(1:100, 9), ncol=3)
matrix2 <- matrix(sample(1:100, 9), ncol=3)

# Take these matrices as input to the array.
result <- array(cbind(matrix1, matrix2), dim = c(3,3,2))
result
## , , 1
## 
##      [,1] [,2] [,3]
## [1,]   68   51   61
## [2,]   43   88   18
## [3,]   86   29   22
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]   45   41   58
## [2,]   38   29   55
## [3,]   33   26   18

README

You can utilize the following single character keyboard shortcuts to enable alternate display modes (Xie, Allaire, and Grolemund (2018)):

Matloff, Norman. 2011. The Art of r Programming: A Tour of Statistical Software Design. No Starch Press.
Xie, Yihui, Joseph J Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. CRC Press.