Data Manipulation and Visualisation using R

  • Sheffield - 15th December 2017
  • 09:30am - 5pm
  • Pam Liversidge Design Studio 2 - E06

Registration

Registration is now closed. Please check back for future dates

Overview

It has been said that 80% of data analysis is spent on the process of cleaning and preparing the data. In this course we introduce some relatively-new additions to the R programming language; dplyr and ggplot2. In combination these provide a powerful toolkit to make the process of manipulating and visualising data easy and intuitive.

Who should attend this course?

Researchers in life sciences who want to get started using R for their data analysis

Aims:- After this course you should be able to:

  • Import tidy datasets into R and perform some basic data cleaning
  • Use dplyr to explore a dataset interactively
  • Produce simple analysis workflows in R
  • Make publication-ready graphics using ggplot2

Objectives:- During this course you will learn about:

  • What constitues a tidy dataset
  • Subseting and filtering datasets using dplyr
  • Piping commands together to form a workflow
  • Producing summary statistics from a dataset
  • Joining datasets using dplyr
  • The grammar of graphics approach to plotting used in ggplot2
  • Producing publication-ready graphics using ggplot2

Prerequisites

We will assume that you have basic familarity with R and are familiar with vectors, data frames, variables and using functions.

Software installation

You will need to bring an internet-enabled laptop to the course and install the latest versions of both R and RStudio before coming to the course

Windows

Install R by downloading and running this .exe file from CRAN. Also, please install the RStudio IDE. Note that if you have separate user and admin accounts, you should run the installers as administrator (right-click on .exe file and select “Run as administrator” instead of double-clicking). Otherwise problems may occur later, for example when installing R packages.

Please copy and paste the following line of text into an R console to install the R packages required for the course

install.packages(c("tidyverse","rmarkdown"))

Mac

Install R by downloading and running this .pkg file from CRAN. Also, please install the RStudio IDE.

Please copy and paste the following line of text into an R console to install the R packages required for the course

install.packages(c("tidyverse","rmarkdown"))

Linux

You can download the binary files for your distribution from CRAN. Or you can use your package manager (e.g. for Debian/Ubuntu run sudo apt-get install r-base and for Fedora run sudo yum install R). Also, please install the RStudio IDE.

Please copy and paste the following line of text into an R console to install the R packages required for the course

install.packages(c("tidyverse","rmarkdown"))

Course Data

Please click on this link to download all the files required to run the examples in the course:- Click Here

Instructors

  • Mark Dunning, Bioinformatics Core Director
  • Katjuša Koler, PhD Student, Hide Lab, (SITraN)
  • Tim Freeman, PhD Student, Wang lab, (SITraN)

Schedule

Solutions

References

Acknowledgements

These materials were developing in collaboration with Matthew Eldridge (CRUK Cambridge Institute), Thomas Carroll (Rockefeller University) and Michael Schubert.

Feedback

  • Please give us feedback on how we can improve the course using this form

For queries relating to collaborating with the Bioinformatics Core team on projects: bioinformatics-core@sheffield.ac.uk

Join our mailing list so as to be notified when we advertise talks and workshops by subscribing to this Google Group. You can also connect with us on Linkedin.

Requests for a Bioinformatics support clinic can be made via the Research Software Engineering (RSE) code clinic system. This is monitored by Bioinformatics Core staff, so we will ensure the appropriate expertise (which may involve individuals from multiple teams) will be available to help you

Queries regarding sequencing and library preparation provision at The University of Sheffield should be directed to the Multi-omics facility in SITraN or the Genomics Laboratory in Biosciences.