class: center, middle, inverse, title-slide .title[ # MTH 209 Data Manipulation and Management ] .subtitle[ ## Lesson 3: Basic Data Structures
(Vectors, Matrices, and Arrays) ] .author[ ###
Ying-Ju Tessa Chen, PhD
Associate Professor
Department of Mathematics
University of Dayton
@ying-ju
ying-ju
ychen4@udayton.edu
] --- # Learning Objectives In this lesson, we will introduce the following data structures in R. - Vectors - Matrices - Arrays **Note:** This lesson is based on the book: The Art of R Programming. .left[.footnote[Matloff, Norman. The art of R programming: A tour of statistical software design. No Starch Press, 2011]] --- ## Vectors - 1 The **vector** is the fundamental data type in R. In this session, We will focus on the first two topics below. The third topic will be introduced later this semester. 1. Recycling: Automatic extension of vectors with certain settings 2. Filtering: The extraction of subsets of vectors 3. Vectorization: When applying a function on vectors element-by-element --- ## Vectors - 2 .pull-left[ - Adding and Deleting Vector Elements ```r x <- c(147, 85, 21, 99, 38, 47, 1, 27) x <- c(x[2:5], 217, 304, x[6]) x ``` ``` ## [1] 85 21 99 38 217 304 47 ``` ```r # Use negative subscripts to exclude certain elements. x[-c(1,4,8)] ``` ``` ## [1] 21 99 217 304 47 ``` ] .pull-right[ - Obtaining the Length of a Vector .small[ We can use the <span Style="color:blue">length()</span> function to get the length of vectors, factors.] ```r length(x) ``` ``` ## [1] 7 ``` ```r x[-length(x)] ``` ``` ## [1] 85 21 99 38 217 304 ``` ] --- ## Vectors - 3 .pull-left[ - Recycling .small[ We can find that the shorter vector was recycled based on the output below when adding two unequal sized vectors.] ```r c(4, 1, 5) + c(3, 2, 4, 7, 1) ``` - Common Vector Operations .small[ When the <span Style="color:blue">*</span> function is applied, the multiplication is done element by element. The same principle applies to other numeric operators.] ] .pull-right[ ```r c(1, 2, 3) * c(4, 1, 5) ``` ``` ## [1] 4 2 15 ``` ```r c(1, 3, 4) / c(4, 1, 5) ``` ``` ## [1] 0.25 3.00 0.80 ``` ```r # x mod y, it returns remainder c(1, 3, 4) %% c(4, 1, 5) ``` ``` ## [1] 1 0 4 ``` ] --- ## Vectors - 4 .pull-left[ - Generating Useful Vectors with the : Operator .small[ We can use the <span Style="color:blue">:</span> operator to create a vector consisting of a range of numbers.] ```r 7:19 ``` ``` ## [1] 7 8 9 10 11 12 13 14 15 16 17 18 19 ``` ```r 15:6 ``` ``` ## [1] 15 14 13 12 11 10 9 8 7 6 ``` ] .pull-right[ .small[Be aware of operator precedence issues.] ```r 1:10-1 ``` ``` ## [1] 0 1 2 3 4 5 6 7 8 9 ``` ```r 1:(10-1) ``` ``` ## [1] 1 2 3 4 5 6 7 8 9 ``` ] --- ## Vectors - 5 .pull-left[ - Generating Vector Sequences with seq() .small[ We can use the <span Style="color:blue">seq()</span> function to generate a sequence in arithmetic progression.] ```r seq(from = 0.1, to = 1, by = 0.1) ``` ``` ## [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 ``` ```r seq(from = 0.1, to = 1, length = 5) ``` ``` ## [1] 0.100 0.325 0.550 0.775 1.000 ``` ] .pull-right[ - Know the default arguments in the <span Style="color:blue">seq()</span> function. ```r seq(1, 10) ``` ``` ## [1] 1 2 3 4 5 6 7 8 9 10 ``` ```r seq(1, 10, 2) ``` ``` ## [1] 1 3 5 7 9 ``` .small[ **Note:** The order of arguments in default is *from*, *to*, and *by*.] ] --- ## Vectors - 6 .pull-left[ - Repeating Vector Constants with rep() .small[ We can use the <span Style="color:blue">rep()</span> function to put the same constant into long vectors.] ```r x <- rep(NA, 3) x ``` ``` ## [1] NA NA NA ``` ```r rep(3, 10) ``` ``` ## [1] 3 3 3 3 3 3 3 3 3 3 ``` ] .pull-right[ ```r rep(c(1, 3, -2), 5) ``` ``` ## [1] 1 3 -2 1 3 -2 1 3 -2 1 3 -2 1 3 -2 ``` ```r rep(c(1, 3, -2), each = 2) ``` ``` ## [1] 1 1 3 3 -2 -2 ``` ```r rep("Hello", 4) ``` ``` ## [1] "Hello" "Hello" "Hello" "Hello" ``` ] --- ## Vectors - 7 .pull-left[ - Using any() and all() .small[ The <span Style="color:blue">any()</span> and <span Style="color:blue">all()</span> functions report a logical value, for which whether any or all of their arguments are TRUE.] ```r x <- c(68, 43, 86, 51, 88, 29, 61, 18, 22, 45) any(x > 50) ``` ``` ## [1] TRUE ``` ```r all(x < 75) ``` ``` ## [1] FALSE ``` ] .pull-right[ - Be aware of missing values (NA) ```r x <- c(NA, x, NA, NA) # include 3 NA's in x any(x < 10) ``` ``` ## [1] NA ``` ```r any(x < 10, na.rm=T) # T means TRUE ``` ``` ## [1] FALSE ``` ] --- ## Matrices and Arrays - 1 Matrix is a two dimensional rectangular data structure in R. A matrix is a vector with two attributes: the number of rows and the number of columns. Matrix row and column subscripts starts with 1. The matrix `\(A\)` below has `\(m\)` rows and `\(n\)` columns. We say that the dimension of `\(A\)` is `\(m\times n\)`. `$$A = \left( \begin{array}{cccc} A[1,1] & A[1,2] & \ldots & A[1,n]\\ A[2,1] & A[2,2] & \ldots & A[2,n]\\ \vdots & \vdots & \ddots & \vdots\\ A[m,1] & A[m,2] & \ldots & A[m,n] \end{array} \right)$$` --- ## Matrices and Arrays - 2 - Creating Matrices .small[ We can use the <span Style="color:blue">matrix()</span> function to create a matrix. One should know that the internal storage of a matrix is in *column-major order* (column 1, column 2,...). The <span Style="color:blue">dim()</span> function could be used to check the dimension of a matrix. ] .pull-left[ ```r X <- matrix(c(1, 3, 2, 4, 5, 8), nrow = 2) dim(X) ``` ``` ## [1] 2 3 ``` ```r # obtain the number of rows nrow(X) ``` ``` ## [1] 2 ``` ] .pull-right[ ```r # obtain the number of columns ncol(X) ``` ``` ## [1] 3 ``` ```r matrix(c(1, 3, 2, 4, 5, 8), ncol=2, byrow = T) ``` ] --- ## Matrices and Arrays - 3 - General Matrix Operations .small[ We can perform linear algebra operations on matrices, such as matrix multiplication, matrix scalar multiplication, and matrix addition.] .pull-left[ ```r A <- matrix(c(1, 2, 3, 4), ncol=2) A ``` ``` ## [,1] [,2] ## [1,] 1 3 ## [2,] 2 4 ``` ```r A %*% A # matrix multiplication ``` ``` ## [,1] [,2] ## [1,] 7 15 ## [2,] 10 22 ``` ] .pull-right[ ```r -2 * A ``` ``` ## [,1] [,2] ## [1,] -2 -6 ## [2,] -4 -8 ``` ```r A + A ``` ``` ## [,1] [,2] ## [1,] 2 6 ## [2,] 4 8 ``` ] --- ## Matrices and Arrays - 4 - Matrix Indexing .small[ The same operations we talked about for vectors apply to matrices as well.] .pull-left[ ```r # set a random seed set.seed(1000) # use 12 random numbers to create a 4 by 3 matrix. A <- matrix(sample(1:100, 12), ncol=3) A ``` ``` ## [,1] [,2] [,3] ## [1,] 68 88 22 ## [2,] 43 29 45 ## [3,] 86 61 38 ## [4,] 51 18 33 ``` ] .pull-right[ ```r # extract elements from 2 to 3 rows A[2:3,] ``` ``` ## [,1] [,2] [,3] ## [1,] 43 29 45 ## [2,] 86 61 38 ``` ```r # remove the 1st and 3rd columns A[, - c(1,3)] ``` ``` ## [1] 88 29 61 18 ``` ] --- ## Matrices and Arrays - 5 We can also assign values to submatrices. The following code chunks show an example based on A given on the previous slide. .pull-left[ ```r A ``` ``` ## [,1] [,2] [,3] ## [1,] 68 88 22 ## [2,] 43 29 45 ## [3,] 86 61 38 ## [4,] 51 18 33 ``` ] .pull-right[ ```r A[, 1:2] <- matrix(c(1, 2, 3, 4, 7, 5, 8, -3), ncol=2) A ``` ``` ## [,1] [,2] [,3] ## [1,] 1 7 22 ## [2,] 2 5 45 ## [3,] 3 8 38 ## [4,] 4 -3 33 ``` ] --- ## Matrices and Arrays - 6 - Filtering on Matrices .small[ We can extract values in a matrix based on conditions.] .pull-left[ ```r A ``` ``` ## [,1] [,2] [,3] ## [1,] 1 7 22 ## [2,] 2 5 45 ## [3,] 3 8 38 ## [4,] 4 -3 33 ``` ```r A[A[,2] >= 6, ] ``` ``` ## [,1] [,2] [,3] ## [1,] 1 7 22 ## [2,] 3 8 38 ``` ] .pull-right[ ```r A[which(A[,2] >= 6), ] ``` ``` ## [,1] [,2] [,3] ## [1,] 1 7 22 ## [2,] 3 8 38 ``` .small[ `A[,2] >= 6` finds rows in the 2nd column such that their values are at least 6 and returns a set of logical values. The <span Style="color:blue">which()</span> function reports which indices are *TRUE*. ] ] --- ## Matrices and Arrays - 7 .pull-left[ .small[ We should note that the filtering criterion can be based on a variable different from the one to which the filtering will be applied. This could be very useful when we have more than one matrices (or data frames) and we want to use the conditions from one matrix to extract values in another matrix. ```r A ``` ``` ## [,1] [,2] [,3] ## [1,] 1 7 22 ## [2,] 2 5 45 ## [3,] 3 8 38 ## [4,] 4 -3 33 ``` ```r z <- c(7, 8, 20, 31) ``` ] ] .pull-right[ .small[ ```r A[z %% 2 == 1, ] ``` ``` ## [,1] [,2] [,3] ## [1,] 1 7 22 ## [2,] 4 -3 33 ``` Here, *z %% 2 == 1* checks each element of z for being an odd number and returns a set of logical values. Since matrices are vectors, we can apply vector operations to them as well. The code chunk below shows an example. ```r which(A < 20) ``` ``` ## [1] 1 2 3 4 5 6 7 8 ``` ] ] --- ## Matrices and Arrays - 8 .pull-left[ - Adding and Deleting Matrix Rows and Columns .small[ We can reassign a matrix and then the dimension of it could be changed. ] ```r A <- A[2:3, -2] ``` - Use the <span Style="color:blue">cbind()</span> or <span Style="color:blue">rbind()</span> functions .small[ The <span Style="color:blue">cbind()</span> function can be used to combine two matrices by columns (numbers of rows in two matrices have to be the same). And <span Style="color:blue">rbind()</span> works similarly.] ] .pull-right[ ```r one <- rep(1, 2) cbind(A, one) ``` ``` ## one ## [1,] 2 45 1 ## [2,] 3 38 1 ``` ```r rbind(A, one) ``` ``` ## [,1] [,2] ## 2 45 ## 3 38 ## one 1 1 ``` ] --- ## Matrices and Arrays - 9 .pull-left[ - Naming Matrix Rows and Columns .small[ We can give names to columns or rows in a matrix if necessary. The <span Style="color:blue">colnames()</span> function can be used to name matrix columns. And <span Style="color:blue">rownames()</span> works similarly. ```r X ``` ``` ## [,1] [,2] [,3] ## [1,] 1 2 5 ## [2,] 3 4 8 ``` ] ] .pull-right[ ```r colnames(X) <- c("C1", "C2", "C3") X ``` ``` ## C1 C2 C3 ## [1,] 1 2 5 ## [2,] 3 4 8 ``` ```r rownames(X) <- c("R1", "R2") X ``` ``` ## C1 C2 C3 ## R1 1 2 5 ## R2 3 4 8 ``` ] --- ## Matrices and Arrays - 10 .pull-left[ - Higher-Dimension Arrays .small[ Suppose we have data taken at different times, one data point per person per variable per time. Time becomes the third dimension, in addition to rows and columns. Such data sets are called array in R. ```r # Create two matrices. set.seed(1000) matrix1 <- matrix(sample(1:100, 9), ncol=3) matrix2 <- matrix(sample(1:100, 9), ncol=3) # Take these matrices as input to the array. result <- array(cbind(matrix1, matrix2), dim = c(3,3,2)) ``` ] ] .pull-right[ .small[ ```r result ``` ``` ## , , 1 ## ## [,1] [,2] [,3] ## [1,] 68 51 61 ## [2,] 43 88 18 ## [3,] 86 29 22 ## ## , , 2 ## ## [,1] [,2] [,3] ## [1,] 45 41 58 ## [2,] 38 29 55 ## [3,] 33 26 18 ``` ] ] --- # Summary of Main Points By now, you should know - Basic Data Structures: Vectors, Matrices, and Arrays - Recycle elements from a data object - How to extract elements from a data object --- # Supplementary Materials Here are some useful supplementary materials for self-learning. .pull-left[ .center[[<img src="https://d33wubrfki0l68.cloudfront.net/565916198b0be51bf88b36f94b80c7ea67cafe7c/7f70b/cover.png" height="250px">](https://adv-r.hadley.nz)] .small[ * [Vectors](https://adv-r.hadley.nz/vectors-chap.html) * [Subsetting](https://adv-r.hadley.nz/subsetting.html) ] ] .pull-right[ .center[[<img src="https://d33wubrfki0l68.cloudfront.net/b88ef926a004b0fce72b2526b0b5c4413666a4cb/24a30/cover.png" height="250px">](https://r4ds.had.co.nz)] .small[ * [Vectors](https://r4ds.had.co.nz/vectors.html) ] ]