If you have previously-attended this course and are reviewing the notes, please be aware of some re-organisation of the materials in this section. Some of the later examples have been moved to Part 3.
We should have loaded the readr
library and imported an
example dataset into R
library(readr)
gapminder <- read_csv("raw_data/gapminder.csv")
We are going to use functions from the
dplyr
package to manipulate the
data frame we have just created. It is perfectly possible to
work with data frames using the functions provided as part of “base
R”. However, many find it easy to read and write code using
dplyr
.
There are many more functions available in
dplyr
than we will cover today. An overview of all
functions is given in a cheatsheet.
Before using any of these functions, we need to load the library:-
library(dplyr)
select
ing columnsWe can access the columns of a data frame using the
select
function.
Firstly, we can select column by name, by adding bare column names
(i.e. not requiring quote marks around the name) after the name of the
data frame, separated by a ,
.
select(gapminder, country, continent)
As we have to type the column names manually (no auto-complete!), we
have to make sure we type the name exactly as it appears in the data. If
select
sees a name that doesn’t exist in the data frame it
should give an informative message
Error: Can't subset columns that don't exist.
We can also omit columns from the ouput by putting a minus
(-
) in front of the column name. Note that this is not the
same as removing the column from the data permanently.
select(gapminder, -country)