The commonly used units that people adopt to share code in R are packages. In general, a package contains code, data, documentation, tests, etc. Most people upload their packages to CRAN, a comprehensive R Archive Network while a few people share their code on GitHub or other web sites. It is recommended that you ONLY download packages from CRAN since these packages are well-maintained.
In order to import packages in RStudio, you need to
know the name of the package.
download the package. Here, we introduce two basic methods:
Note: It is essential to put the quotation marks around the package’s name.
Note: we should leave Install dependencies checked so R will download any additional packages needed in order to use some functions or data in the package you are currently downloading.
Note:
Sometimes, warning messages are given in the Console when installing certain packages indicating that the package was built using an older version of R. In general, these warnings can be ignored since they are still compatible with newer versions of R.
You only need to install a package once when the first time you need it. You can always import the package after you install it.
The main difference between library() and require() functions is library() returns an error if the package doesn’t exist while require() returns FALSE and gives a warning.
In this section, we introduce two methods of importing data from some commonly used formats and write files.
Note: A CSV (comma-separated values) file is a text file in which information is separated by commas.
In general, these functions will work well. We include the path to a file, and we will obtain a tibble which is a modern reimagining of the data frame. It is much easier to navigate, view, and manipulate the contents of data using a tibble as every row is corresponding to an observation and every column is corresponding with a variable.
The following code chunk gives an example of reading a data file.
library(tidyverse)
ds_salaries <- read_csv("C:/Users/ychen4/Dropbox/MTH 209/Data for Brainstorm Activities/ds_salaries.csv")
head(ds_salaries) # use head() to read the first six rows of the data## # A tibble: 6 × 12
## ...1 work_year exper…¹ emplo…² job_t…³ salary salar…⁴ salar…⁵ emplo…⁶ remot…⁷
## <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <dbl> <chr> <dbl>
## 1 0 2020 MI FT Data S… 70000 EUR 79833 DE 0
## 2 1 2020 SE FT Machin… 260000 USD 260000 JP 0
## 3 2 2020 SE FT Big Da… 85000 GBP 109024 GB 50
## 4 3 2020 MI FT Produc… 20000 USD 20000 HN 0
## 5 4 2020 SE FT Machin… 150000 USD 150000 US 50
## 6 5 2020 EN FT Data A… 72000 USD 72000 US 100
## # … with 2 more variables: company_location <chr>, company_size <chr>, and
## # abbreviated variable names ¹experience_level, ²employment_type, ³job_title,
## # ⁴salary_currency, ⁵salary_in_usd, ⁶employee_residence, ⁷remote_ratio
## Rows: 607
## Columns: 12
## $ ...1 <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1…
## $ work_year <dbl> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 202…
## $ experience_level <chr> "MI", "SE", "SE", "MI", "SE", "EN", "SE", "MI", "MI…
## $ employment_type <chr> "FT", "FT", "FT", "FT", "FT", "FT", "FT", "FT", "FT…
## $ job_title <chr> "Data Scientist", "Machine Learning Scientist", "Bi…
## $ salary <dbl> 70000, 260000, 85000, 20000, 150000, 72000, 190000,…
## $ salary_currency <chr> "EUR", "USD", "GBP", "USD", "USD", "USD", "USD", "H…
## $ salary_in_usd <dbl> 79833, 260000, 109024, 20000, 150000, 72000, 190000…
## $ employee_residence <chr> "DE", "JP", "GB", "HN", "US", "US", "US", "HU", "US…
## $ remote_ratio <dbl> 0, 0, 50, 0, 50, 100, 100, 50, 100, 50, 0, 0, 0, 10…
## $ company_location <chr> "DE", "JP", "GB", "HN", "US", "US", "US", "HU", "US…
## $ company_size <chr> "L", "S", "M", "S", "L", "L", "S", "L", "L", "S", "…
Note: glimpse() is a function included in tidyverse.
We can read the data available online as well.
OH_COVID <- read_csv("https://coronavirus.ohio.gov/static/dashboards/COVIDDeathData_CountyOfDeath.csv")
glimpse(OH_COVID)## Rows: 765,639
## Columns: 11
## $ County <chr> "Adams", "Adams", "Adam…
## $ Sex <chr> NA, NA, NA, NA, NA, NA,…
## $ `Age Range` <chr> "0-19", "0-19", "0-19",…
## $ `Onset Date` <date> 2020-12-10, 2020-12-11…
## $ `Admission Date` <chr> NA, NA, NA, NA, NA, NA,…
## $ `Date Of Death` <date> NA, NA, NA, NA, NA, NA…
## $ `Case Count` <dbl> 2, 1, 1, 1, 1, 2, 1, 1,…
## $ `Hospitalized Count` <dbl> 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Death Due To Illness Count - County Of Death` <dbl> 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `State of Death` <chr> NA, NA, NA, NA, NA, NA,…
## $ `State of Residence` <chr> NA, NA, NA, NA, NA, NA,…
Note: You may see the single quotes are included in some names of variables. This is because there is at least one space included in the name.
Question: If we want to remove the single quotes in the names of variables, what could be possible solution?
The second data file could be queried from CDC WONDER.
CDC_Death <- read_tsv("C:/Users/ychen4/Dropbox/MTH 209/Class Handouts/Data/Underlying Cause of Death.txt")
glimpse(CDC_Death)## Rows: 4,220
## Columns: 14
## $ Notes <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ Year <dbl> 2005, 2005, 2005, 2005, 2005, 2005, 2005, …
## $ `Year Code` <dbl> 2005, 2005, 2005, 2005, 2005, 2005, 2005, …
## $ `Five-Year Age Groups` <chr> "< 1 year", "< 1 year", "< 1 year", "< 1 y…
## $ `Five-Year Age Groups Code` <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1…
## $ Gender <chr> "Female", "Female", "Female", "Female", "F…
## $ `Gender Code` <chr> "F", "F", "F", "F", "F", "F", "F", "F", "M…
## $ Race <chr> "American Indian or Alaska Native", "Asian…
## $ `Race Code` <chr> "1002-5", "A-PI", "2054-5", "2054-5", "205…
## $ `Hispanic Origin` <chr> "Not Hispanic or Latino", "Not Hispanic or…
## $ `Hispanic Origin Code` <chr> "2186-2", "2186-2", "2135-2", "2186-2", "N…
## $ Deaths <dbl> 56, 94, 26, 895, 10, 382, 1373, 15, 83, 10…
## $ Population <chr> "8158", "20150", "7024", "74736", "Not App…
## $ `Crude Rate` <chr> "686.4", "466.5", "370.2", "1197.5", "Not …
Note: In many programming languages like C, C++, Java, MatLab, Python, Perl, R, a backslash, \, works as an escape character in strings. So in these languages, we need to use either slash, /, or double backslash, \\, in the string in order to get a single backslash for a path.
Similarly, readr provides the following functions to write files:
Some Common arguments in these functions:
We can save the CDC wonder data to a CSV file.
You can utilize the following single character keyboard shortcuts to enable alternate display modes (Xie, Allaire, and Grolemund (2018)):
A: Switches show of current versus all slides (helpful for printing all pages)
B: Make fonts large
c: Show table of contents
S: Make fonts smaller