Introduction to R - Part II

If you have previously-attended this course and are reviewing the notes, please be aware of some re-organisation of the materials in this section. Some of the later examples have been moved to Part 3.

We should have loaded the readr library and imported an example dataset into R

library(readr)
gapminder <- read_csv("raw_data/gapminder.csv")

Manipulating Columns

We are going to use functions from the dplyr package to manipulate the data frame we have just created. It is perfectly possible to work with data frames using the functions provided as part of “base R”. However, many find it easy to read and write code using dplyr.

There are many more functions available in dplyr than we will cover today. An overview of all functions is given in a cheatsheet.

  • dplyr cheatsheet. The cheatsheet is also available through the RStudio Help menu.

Before using any of these functions, we need to load the library:-

library(dplyr)

selecting columns

We can access the columns of a data frame using the select function.

by name

Firstly, we can select column by name, by adding bare column names (i.e. not requiring quote marks around the name) after the name of the data frame, separated by a , .

select(gapminder, country, continent)

As we have to type the column names manually (no auto-complete!), we have to make sure we type the name exactly as it appears in the data. If select sees a name that doesn’t exist in the data frame it should give an informative message Error: Can't subset columns that don't exist.

We can also omit columns from the ouput by putting a minus (-) in front of the column name. Note that this is not the same as removing the column from the data permanently.

select(gapminder, -country)