MTH 209 Data Manipulation and Management

Lesson 2: Basic Syntax and Data Types

Ying-Ju Tessa Chen
ychen4@udayton.edu
University of Dayton

Basic Syntax

We will introduce some basic syntax in R.

Pound Sign in R

# symbol is for adding comments and notes to your code. In any line of your code, anything after it will not be executed.

# Hello, UD students!

Examples of Simple Algebra

7+11
7-11
7 - 11
7*11
7/11
7 / 11

Note: you should find that 7/11 and 7 / 11 generate the same result. This is because the blank spaces in the code is generally ignored.

Assignment Operators: Equal Sign and Arrow in R

a1 = 7
b1 = 11
a2 <- 7
11 -> b2

What’s difference between them?

mean(x = c(1, 8, 4, 9, 13))
x
## Error in eval(expr, envir, enclos): object 'x' not found
mean(x <- c(1, 8, 4, 9, 13))
x
## [1]  1  8  4  9 13

In the first case, \(x\) is an argument in the function mean() while the second case assigns a vector \((1, 8, 3, 9, 13)\) to \(x\) and then finds the mean value of it.

We should use <- as an assignment operator and = for function arguments!

Parentheses, Brackets & Curly Brackets

Parentheses, ( ), are used to call functions; Brackets, [ ], are used to obtain values in a data structure, Curly Brackets, { }, are used to denote a block of code in a function or in a conditional statement.

Here, we give examples about the use of ( ) and [ ]. The use of curly brackets will be introduced later.

w <- c(17, 57, 69, 50, 100, 68, 29, 16, 65, 5, 15, 25) # c() combines objects
median(w) # find the median of w
w[3] # find the value of the third element in w
w[1:2] # find the values of the first and second elements in w
w[2:4] # find the values between the second and the fourth elements in w
w[c(2,5,8)] # only find the values of the second, fifth and eighth elements in w
w[-5] # the fifth element in w is removed
w[w < 50] # only obtain values that satisfy the condition

Note: c() can concatenate more than just vectors. We will talk about this later.

Basic Data Types

Here are some basic data types in R.

We will focus on the use of the first five types.

Character - 1

A character object is used to store text, letters, or words (strings) in R.

x <- "Hello"
y <- "UD students!"
class(x) # we can use class() function to obtain the data type
## [1] "character"
nchar(x) # use nchar() to count the number of characters
## [1] 5

Note: When defining strings, double quotes ” ” and single quotes ’ ’ are interchangeably but double quotes are preferred (and character constants are printed using double quotes), so single quotes are normally only used to delimit character constants containing double quotes R Documentation (2020).

Character - 2

If we want to combine two strings into one string, we can use paste() or paste0() function.

paste(x,y)
## [1] "Hello UD students!"
paste(x,y,sep=",")
## [1] "Hello,UD students!"
paste(x,y,sep=", ")
## [1] "Hello, UD students!"
paste(x, ", ", y)
## [1] "Hello ,  UD students!"
paste0(x,y)
## [1] "HelloUD students!"

These two functions could be very useful.

Character - 3

Here we give one advanced example.

allfiles1 <- paste("file_", 1:5)
allfiles2 <- paste("file_", 1:5, collapse = "_")
allfiles3 <- paste("file", 1:5, sep = "_")
allfiles1
## [1] "file_ 1" "file_ 2" "file_ 3" "file_ 4" "file_ 5"
allfiles2
## [1] "file_ 1_file_ 2_file_ 3_file_ 4_file_ 5"
allfiles3
## [1] "file_1" "file_2" "file_3" "file_4" "file_5"

What is wrong here?

Factor

A factor object is used to store categorical / qualitative variables.

grade <- factor(c("A", "C", "B", "B-", "A", "C+", "D", "A-", "B+", "C-", "B"))
gender <- c("M", "F", "F", "M", "M", "M", "F", "M", "F")
gender <- as.factor(gender)
class(gender)
## [1] "factor"
levels(gender) # use levels() to find all categories in the variable
## [1] "F" "M"
length(grade) # use length() to find the length of vectors 
## [1] 11


Numeric

A numeric object is used to store numeric data in R.

x1 <- 3
x2 <- c(-3.13, 2.47, 6, -1.5, 4.29, 2.72, 1, 0, 3.85)
class(x1)
class(x2)
sum(x2)
max(x2)
min(x2)
range(x2)
round(x2) # round off the values
ceiling(x2) # round up to the nearest integer
floor(x2) # round down to the nearest integer
summary(x2)

Integer

An integer object is used to store numeric data without decimals.

x2 <- c(-3.13, 2.47, 6, -1.5, 4.29, 2.72, 1, 0, 3.85)
x3 <- as.integer(x2) # only remain the integers
x3
## [1] -3  2  6 -1  4  2  1  0  3
class(x3)
## [1] "integer"

Logical

A logic object contains only two values: TRUE or FALSE.

y1 <- -7
y2 <- 11
y1 > y2
## [1] FALSE
y1 == y2  # check if two objects are the same
## [1] FALSE
y1 <= y2
## [1] TRUE
result <- y1 > y2
class(result)
## [1] "logical"

Create an empty vector

There are some situations that we may want to create an empty vector. Here is a simple example.

x <- c()
y1 <- vector("character", length=3)
y2 <- character(3)
z1 <- vector("numeric", 5)
z2 <- numeric(5)
w <- rep(NA, 2)

Remark for as.character(), as.integer(), as.numeric(), as.factor() functions

We can use these functions to transform the data type.

z1 <- as.integer(c(3, 5.8))
class(z1)
## [1] "integer"
z2 <- as.character(z1) # transform integer object to character
z2
## [1] "3" "5"
as.numeric(3 > 8)  # transform logical object to numeric values
## [1] 0
gender <- factor(c("M", "F", "F", "M", "M", "M", "F", "M", "F"))
as.numeric(gender) # transform levels in the categorical variables to numbers
## [1] 2 1 1 2 2 2 1 2 1

Note: as.numeric() function can transform logical object to numeric values: TRUE: 1 and FALSE: 0.

README

You can utilize the following single character keyboard shortcuts to enable alternate display modes (Xie, Allaire, and Grolemund (2018)):

R Documentation. 2020. “R: Quotes.” https://stat.ethz.ch/R-manual/R-devel/library/base/html/Quotes.html.
Xie, Yihui, Joseph J Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. CRC Press.