Chapter 4 Intro to the {tidyverse}
4.1 Why {tidyverse}
?
The {tidyverse}
is a collection of R packages that extend the functionality of base R. The packages are developed to simplify and accelerate data analysis with R. All packages share an underlying design philosophy, grammar, and data structures. You may say that the {tidyverse}
equips R with superpowers. Some of the packages may be familiar, e.g. {ggplot2}
or {tidyr}
and these may be installed already. But if you’d like to use some of these, you may as well install it in one go:
if(!require(tidyverse)){
install.packages("tidyverse",repos = "http://cran.us.r-project.org")
library(tidyverse)
}
## Loading required package: tidyverse
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.6 ✓ dplyr 1.0.8
## ✓ tidyr 1.2.0 ✓ stringr 1.4.0
## ✓ readr 2.1.2 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
4.2 Reading data from excel
The {readxl}
package that is installed as part of the {tidyverse}
can be used to read data from excel files:
library(readxl)
<- read_excel("../datasets/allData.xlsx", sheet = 2) df_xl
## New names:
## * `` -> ...4
head(df_xl)
## # A tibble: 6 × 5
## Hair Eye Sex ...4 Freq
## <chr> <chr> <chr> <lgl> <dbl>
## 1 <NA> <NA> <NA> NA NA
## 2 Black Brown Male NA 32
## 3 Brown Brown Male NA 53
## 4 Red Brown Male NA 10
## 5 Blond Brown Male NA 3
## 6 Black Blue Male NA 11
4.3 The pipe operator
Unlike the {openxlsx}
package, there is no automatic detection and removal of empty lines or columns. To do this we can add a function to do this with the pipe operator %>%
that is often used in the tidyverse. This operator is used to take the result of a function and feed it into the next function. The function that we will use is drop_na()
and we tell this function to remove any line with “NA” in the column “Hair” from the data:
<- read_excel("../datasets/allData.xlsx", sheet = 2, skip = 1) %>% drop_na(Hair) df_xl
## New names:
## * `` -> ...4
head(df_xl)
## # A tibble: 6 × 5
## Hair Eye Sex ...4 Freq
## <chr> <chr> <chr> <lgl> <dbl>
## 1 Black Brown Male NA 32
## 2 Brown Brown Male NA 53
## 3 Red Brown Male NA 10
## 4 Blond Brown Male NA 3
## 5 Black Blue Male NA 11
## 6 Brown Blue Male NA 50
It is possible to use multiple pipe operators to combine multiple functions in a single command. Here we add another function select()
to get rid of the 4th column, since it is empty. The -
indicates that we do not select column number 4. Since the commands become pretty long when multiple pipe operators are used, it is good practice to start each function on a new line:
read_excel("../datasets/allData.xlsx", sheet = 2, skip = 1) %>%
drop_na(Hair) %>%
select(-4) %>%
head()
## New names:
## * `` -> ...4
## # A tibble: 6 × 4
## Hair Eye Sex Freq
## <chr> <chr> <chr> <dbl>
## 1 Black Brown Male 32
## 2 Brown Brown Male 53
## 3 Red Brown Male 10
## 4 Blond Brown Male 3
## 5 Black Blue Male 11
## 6 Brown Blue Male 50
There are many more functions for data manipulation in the tidyverse, but in this workshop we will focus on the use of the {ggplot2}
package for plotting.
4.4 Resources on tidyverse
Tidyverse website: https://www.tidyverse.org
R for Data Science (Hadley Wickham & Garrett Grolemund): https://r4ds.had.co.nz/index.html
A Modern Dive into R and the Tidyverse (Chester Ismay & Albert Y. Kim ): https://moderndive.com