Part 2 Solutions

### Load the libraries will we need
library(readr)
library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union
library(ggplot2)

Read the data and check

gapminder <- read_csv("raw_data/gapminder.csv")
Rows: 1704 Columns: 6── Column specification ──────────────────────────────────────────────────
Delimiter: ","
chr (2): country, continent
dbl (4): year, lifeExp, pop, gdpPercap
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(gapminder)

In-class exercises

Create a subset of the data where the population less than a million in the year 2002

filter(gapminder, pop < 1e6, year == 2002)

Create a subset of the data where the life expectancy is greater than 75 in the years prior to 1987

filter(gapminder, lifeExp > 75, year < 1987)

Create a subset of the European data where the life expectancy is between 75 and 80 in the years 2002 or 2007.

filter(gapminder, continent == "Europe", lifeExp > 75, lifeExp < 80 , year == 2002 | year == 2007)

Can also use the between function from dplyr and the %in% function

filter(gapminder, continent == "Europe", 
       between(lifeExp, 75,80), 
       year %in% c(2002,2007))

Write a workflow to do the following:-

  • Filter the data to include just observations from the year 2002
  • Re-arrange the table so that the countries from each continent are ordered according to decreasing wealth. i.e. the wealthiest countries first
  • Select all the columns apart from year
  • Write the data frame out to a file in out_data/ folder
# Less-efficient solution before pipes are introduced

# create out_data folder before we start (no warning given if it already exists)
dir.create("out_data", showWarnings = FALSE)

gapminder2 <- filter(gapminder, year == 2002)
gapminder3 <- arrange(gapminder2, continent, desc(gdpPercap))
gapminder4 <- select(gapminder3, -year)
write_csv(gapminder4, "out_data/gapminder_2002.csv")

Re-written using pipes

filter(gapminder, year == 2002) %>% 
  arrange(continent, desc(gdpPercap)) %>% 
  select(-year) %>% 
write_csv("out_data/gapminder_piped_2002.csv")

The violin plot is a popular alternative to the boxplot. Create a violin plot with geom_violin to visualise the differences in GDP between different continents.

ggplot(gapminder, aes(x = continent, y = gdpPercap)) + geom_violin()

Create a subset of the gapminder data frame containing just the rows for your country of birth


# don't forget that R is case-sensitive!

uk_data <- filter(gapminder, country == "United Kingdom")

Has there been an increase in life expectancy over time? - visualise the trend using a scatter plot (geom_point), line graph (geom_line) or smoothed line (geom_smooth).

ggplot(uk_data, aes(x = year, y = lifeExp)) + geom_point()

ggplot(uk_data, aes(x = year, y = lifeExp)) + geom_line()

ggplot(uk_data, aes(x = year, y = lifeExp)) + geom_smooth()
`geom_smooth()` using method = 'loess' and formula 'y ~ x'

## can combine all plots
ggplot(uk_data, aes(x = year, y = lifeExp)) + geom_point() + geom_smooth()
`geom_smooth()` using method = 'loess' and formula 'y ~ x'

Note: this exercise could also make use of the piping technique

filter(gapminder, country == "United Kingdom") %>% 
  ggplot(aes(x = year, y = lifeExp)) + geom_point() + geom_smooth()
`geom_smooth()` using method = 'loess' and formula 'y ~ x'

What happens when you modify the geom_boxplot example to compare the gdp distributions for different years? - Look at the message ggplot2 prints above the plot and try to modify the code to give a separate boxplot for each year

# this is how we might expect the code to look like
ggplot(gapminder, aes(x = year, y = gdpPercap)) + geom_boxplot()
Warning: Continuous x aesthetic -- did you forget aes(group=...)?

The previous output hints that you might want to group by year - otherwise it thinks that year is a numerical variable

ggplot(gapminder, aes(x = year, y = gdpPercap, group=year)) + geom_boxplot()

You will often see this alternative of using the as.factor function to make year into a categorical variable.

ggplot(gapminder, aes(x = as.factor(year), y = gdpPercap)) + geom_boxplot()

Homework

Task 1

Add an extra column; the first letter of each country name. Assigning a new variable on each line

gapminder2 <- mutate(gapminder, FirstLetter = substr(country, 1,1))
gapminder3 <- filter(gapminder2, FirstLetter == "Z")
gapminder3

A more efficient solution

gapminder %>% 
  mutate(FirstLetter = substr(country,1,1)) %>% 
  filter(FirstLetter == "Z")

Task 2 - Heatmap of life expectancy

## Get the European countries
filter(gapminder, continent == "Europe") %>% 
## make heatmap. See the fill aesthetic to be life expectancy
ggplot(aes(x=year,y=country,fill=lifeExp)) + geom_tile()

LS0tCnRpdGxlOiAiUiBOb3RlYm9vayIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKIyBQYXJ0IDIgU29sdXRpb25zCgpgYGB7cn0KIyMjIExvYWQgdGhlIGxpYnJhcmllcyB3aWxsIHdlIG5lZWQKbGlicmFyeShyZWFkcikKbGlicmFyeShkcGx5cikKbGlicmFyeShnZ3Bsb3QyKQpgYGAKClJlYWQgdGhlIGRhdGEgYW5kIGNoZWNrCgpgYGB7cn0KZ2FwbWluZGVyIDwtIHJlYWRfY3N2KCJyYXdfZGF0YS9nYXBtaW5kZXIuY3N2IikKaGVhZChnYXBtaW5kZXIpCmBgYAoKCiMjIEluLWNsYXNzIGV4ZXJjaXNlcwoKQ3JlYXRlIGEgc3Vic2V0IG9mIHRoZSBkYXRhIHdoZXJlIHRoZSBwb3B1bGF0aW9uIGxlc3MgdGhhbiBhIG1pbGxpb24gaW4gdGhlIHllYXIgMjAwMgoKYGBge3J9CmZpbHRlcihnYXBtaW5kZXIsIHBvcCA8IDFlNiwgeWVhciA9PSAyMDAyKQpgYGAKCkNyZWF0ZSBhIHN1YnNldCBvZiB0aGUgZGF0YSB3aGVyZSB0aGUgbGlmZSBleHBlY3RhbmN5IGlzIGdyZWF0ZXIgdGhhbiA3NSBpbiB0aGUgeWVhcnMgcHJpb3IgdG8gMTk4NwoKYGBge3J9CmZpbHRlcihnYXBtaW5kZXIsIGxpZmVFeHAgPiA3NSwgeWVhciA8IDE5ODcpCmBgYAoKQ3JlYXRlIGEgc3Vic2V0IG9mIHRoZSBFdXJvcGVhbiBkYXRhIHdoZXJlIHRoZSBsaWZlIGV4cGVjdGFuY3kgaXMgYmV0d2VlbiA3NSBhbmQgODAgaW4gdGhlIHllYXJzIDIwMDIgb3IgMjAwNy4KCmBgYHtyfQpmaWx0ZXIoZ2FwbWluZGVyLCBjb250aW5lbnQgPT0gIkV1cm9wZSIsIGxpZmVFeHAgPiA3NSwgbGlmZUV4cCA8IDgwICwgeWVhciA9PSAyMDAyIHwgeWVhciA9PSAyMDA3KQpgYGAKQ2FuIGFsc28gdXNlIHRoZSBgYmV0d2VlbmAgZnVuY3Rpb24gZnJvbSBgZHBseXJgIGFuZCB0aGUgYCVpbiVgIGZ1bmN0aW9uCgpgYGB7cn0KZmlsdGVyKGdhcG1pbmRlciwgY29udGluZW50ID09ICJFdXJvcGUiLCAKICAgICAgIGJldHdlZW4obGlmZUV4cCwgNzUsODApLCAKICAgICAgIHllYXIgJWluJSBjKDIwMDIsMjAwNykpCmBgYAoKCldyaXRlIGEgd29ya2Zsb3cgdG8gZG8gdGhlIGZvbGxvd2luZzotIAoKLSBGaWx0ZXIgdGhlIGRhdGEgdG8gaW5jbHVkZSBqdXN0IG9ic2VydmF0aW9ucyBmcm9tIHRoZSB5ZWFyIDIwMDIKLSBSZS1hcnJhbmdlIHRoZSB0YWJsZSBzbyB0aGF0IHRoZSBjb3VudHJpZXMgZnJvbSBlYWNoIGNvbnRpbmVudCBhcmUgb3JkZXJlZCBhY2NvcmRpbmcgdG8gZGVjcmVhc2luZyB3ZWFsdGguIGkuZS4gdGhlIHdlYWx0aGllc3QgY291bnRyaWVzIGZpcnN0Ci0gU2VsZWN0IGFsbCB0aGUgY29sdW1ucyBhcGFydCBmcm9tIHllYXIKLSBXcml0ZSB0aGUgZGF0YSBmcmFtZSBvdXQgdG8gYSBmaWxlIGluIG91dF9kYXRhLyBmb2xkZXIKCmBgYHtyfQojIExlc3MtZWZmaWNpZW50IHNvbHV0aW9uIGJlZm9yZSBwaXBlcyBhcmUgaW50cm9kdWNlZAoKIyBjcmVhdGUgb3V0X2RhdGEgZm9sZGVyIGJlZm9yZSB3ZSBzdGFydCAobm8gd2FybmluZyBnaXZlbiBpZiBpdCBhbHJlYWR5IGV4aXN0cykKZGlyLmNyZWF0ZSgib3V0X2RhdGEiLCBzaG93V2FybmluZ3MgPSBGQUxTRSkKCmdhcG1pbmRlcjIgPC0gZmlsdGVyKGdhcG1pbmRlciwgeWVhciA9PSAyMDAyKQpnYXBtaW5kZXIzIDwtIGFycmFuZ2UoZ2FwbWluZGVyMiwgY29udGluZW50LCBkZXNjKGdkcFBlcmNhcCkpCmdhcG1pbmRlcjQgPC0gc2VsZWN0KGdhcG1pbmRlcjMsIC15ZWFyKQp3cml0ZV9jc3YoZ2FwbWluZGVyNCwgIm91dF9kYXRhL2dhcG1pbmRlcl8yMDAyLmNzdiIpCmBgYAoKUmUtd3JpdHRlbiB1c2luZyBwaXBlcwoKYGBge3J9CmZpbHRlcihnYXBtaW5kZXIsIHllYXIgPT0gMjAwMikgJT4lIAogIGFycmFuZ2UoY29udGluZW50LCBkZXNjKGdkcFBlcmNhcCkpICU+JSAKICBzZWxlY3QoLXllYXIpICU+JSAKd3JpdGVfY3N2KCJvdXRfZGF0YS9nYXBtaW5kZXJfcGlwZWRfMjAwMi5jc3YiKQoKYGBgCgoKVGhlIHZpb2xpbiBwbG90IGlzIGEgcG9wdWxhciBhbHRlcm5hdGl2ZSB0byB0aGUgYm94cGxvdC4gQ3JlYXRlIGEgdmlvbGluIHBsb3Qgd2l0aCBnZW9tX3Zpb2xpbiB0byB2aXN1YWxpc2UgdGhlIGRpZmZlcmVuY2VzIGluIEdEUCBiZXR3ZWVuIGRpZmZlcmVudCBjb250aW5lbnRzLgoKYGBge3J9CmdncGxvdChnYXBtaW5kZXIsIGFlcyh4ID0gY29udGluZW50LCB5ID0gZ2RwUGVyY2FwKSkgKyBnZW9tX3Zpb2xpbigpCmBgYAoKQ3JlYXRlIGEgc3Vic2V0IG9mIHRoZSBnYXBtaW5kZXIgZGF0YSBmcmFtZSBjb250YWluaW5nIGp1c3QgdGhlIHJvd3MgZm9yIHlvdXIgY291bnRyeSBvZiBiaXJ0aAoKYGBge3J9CgojIGRvbid0IGZvcmdldCB0aGF0IFIgaXMgY2FzZS1zZW5zaXRpdmUhCgp1a19kYXRhIDwtIGZpbHRlcihnYXBtaW5kZXIsIGNvdW50cnkgPT0gIlVuaXRlZCBLaW5nZG9tIikKYGBgCgpIYXMgdGhlcmUgYmVlbiBhbiBpbmNyZWFzZSBpbiBsaWZlIGV4cGVjdGFuY3kgb3ZlciB0aW1lPwogLSB2aXN1YWxpc2UgdGhlIHRyZW5kIHVzaW5nIGEgc2NhdHRlciBwbG90IChnZW9tX3BvaW50KSwgbGluZSBncmFwaCAoZ2VvbV9saW5lKSBvciBzbW9vdGhlZCBsaW5lIChnZW9tX3Ntb290aCkuCgpgYGB7cn0KZ2dwbG90KHVrX2RhdGEsIGFlcyh4ID0geWVhciwgeSA9IGxpZmVFeHApKSArIGdlb21fcG9pbnQoKQpgYGAKCmBgYHtyfQpnZ3Bsb3QodWtfZGF0YSwgYWVzKHggPSB5ZWFyLCB5ID0gbGlmZUV4cCkpICsgZ2VvbV9saW5lKCkKYGBgCgpgYGB7cn0KZ2dwbG90KHVrX2RhdGEsIGFlcyh4ID0geWVhciwgeSA9IGxpZmVFeHApKSArIGdlb21fc21vb3RoKCkKYGBgCgoKYGBge3J9CiMjIGNhbiBjb21iaW5lIGFsbCBwbG90cwpnZ3Bsb3QodWtfZGF0YSwgYWVzKHggPSB5ZWFyLCB5ID0gbGlmZUV4cCkpICsgZ2VvbV9wb2ludCgpICsgZ2VvbV9zbW9vdGgoKQpgYGAKCk5vdGU6IHRoaXMgZXhlcmNpc2UgY291bGQgYWxzbyBtYWtlIHVzZSBvZiB0aGUgcGlwaW5nIHRlY2huaXF1ZQoKYGBge3J9CmZpbHRlcihnYXBtaW5kZXIsIGNvdW50cnkgPT0gIlVuaXRlZCBLaW5nZG9tIikgJT4lIAogIGdncGxvdChhZXMoeCA9IHllYXIsIHkgPSBsaWZlRXhwKSkgKyBnZW9tX3BvaW50KCkgKyBnZW9tX3Ntb290aCgpCmBgYApXaGF0IGhhcHBlbnMgd2hlbiB5b3UgbW9kaWZ5IHRoZSBnZW9tX2JveHBsb3QgZXhhbXBsZSB0byBjb21wYXJlIHRoZSBnZHAgZGlzdHJpYnV0aW9ucyBmb3IgZGlmZmVyZW50IHllYXJzPwotIExvb2sgYXQgdGhlIG1lc3NhZ2UgZ2dwbG90MiBwcmludHMgYWJvdmUgdGhlIHBsb3QgYW5kIHRyeSB0byBtb2RpZnkgdGhlIGNvZGUgdG8gZ2l2ZSBhIHNlcGFyYXRlIGJveHBsb3QgZm9yIGVhY2ggeWVhcgoKYGBge3J9CiMgdGhpcyBpcyBob3cgd2UgbWlnaHQgZXhwZWN0IHRoZSBjb2RlIHRvIGxvb2sgbGlrZQpnZ3Bsb3QoZ2FwbWluZGVyLCBhZXMoeCA9IHllYXIsIHkgPSBnZHBQZXJjYXApKSArIGdlb21fYm94cGxvdCgpCmBgYAoKVGhlIHByZXZpb3VzIG91dHB1dCBoaW50cyB0aGF0IHlvdSBtaWdodCB3YW50IHRvIGdyb3VwIGJ5IHllYXIgLSBvdGhlcndpc2UgaXQgdGhpbmtzIHRoYXQgeWVhciBpcyBhIG51bWVyaWNhbCB2YXJpYWJsZQoKYGBge3J9CmdncGxvdChnYXBtaW5kZXIsIGFlcyh4ID0geWVhciwgeSA9IGdkcFBlcmNhcCwgZ3JvdXA9eWVhcikpICsgZ2VvbV9ib3hwbG90KCkKYGBgCllvdSB3aWxsIG9mdGVuIHNlZSB0aGlzIGFsdGVybmF0aXZlIG9mIHVzaW5nIHRoZSBhcy5mYWN0b3IgZnVuY3Rpb24gdG8gbWFrZSB5ZWFyIGludG8gYSBjYXRlZ29yaWNhbCB2YXJpYWJsZS4KCmBgYHtyfQpnZ3Bsb3QoZ2FwbWluZGVyLCBhZXMoeCA9IGFzLmZhY3Rvcih5ZWFyKSwgeSA9IGdkcFBlcmNhcCkpICsgZ2VvbV9ib3hwbG90KCkKCmBgYAoKIyMgSG9tZXdvcmsKCgoKIyMjIFRhc2sgMQoKQWRkIGFuIGV4dHJhIGNvbHVtbjsgdGhlIGZpcnN0IGxldHRlciBvZiBlYWNoIGNvdW50cnkgbmFtZS4gQXNzaWduaW5nIGEgbmV3IHZhcmlhYmxlIG9uIGVhY2ggbGluZQoKYGBge3J9CmdhcG1pbmRlcjIgPC0gbXV0YXRlKGdhcG1pbmRlciwgRmlyc3RMZXR0ZXIgPSBzdWJzdHIoY291bnRyeSwgMSwxKSkKZ2FwbWluZGVyMyA8LSBmaWx0ZXIoZ2FwbWluZGVyMiwgRmlyc3RMZXR0ZXIgPT0gIloiKQpnYXBtaW5kZXIzCmBgYAoKQSBtb3JlIGVmZmljaWVudCBzb2x1dGlvbgoKYGBge3J9CmdhcG1pbmRlciAlPiUgCiAgbXV0YXRlKEZpcnN0TGV0dGVyID0gc3Vic3RyKGNvdW50cnksMSwxKSkgJT4lIAogIGZpbHRlcihGaXJzdExldHRlciA9PSAiWiIpCmBgYAoKIyMjIFRhc2sgMiAtIEhlYXRtYXAgb2YgbGlmZSBleHBlY3RhbmN5CgpgYGB7cn0KIyMgR2V0IHRoZSBFdXJvcGVhbiBjb3VudHJpZXMKZmlsdGVyKGdhcG1pbmRlciwgY29udGluZW50ID09ICJFdXJvcGUiKSAlPiUgCiMjIG1ha2UgaGVhdG1hcC4gU2VlIHRoZSBmaWxsIGFlc3RoZXRpYyB0byBiZSBsaWZlIGV4cGVjdGFuY3kKZ2dwbG90KGFlcyh4PXllYXIseT1jb3VudHJ5LGZpbGw9bGlmZUV4cCkpICsgZ2VvbV90aWxlKCkKYGBgCgo=