Introduction

Installing Bioinformatics tools can be a major headache and frustration; even for experienced Bioinformaticians. In this brief tutorial we explain how you can run command-line Bioinformatics tools on your own desktop / laptop, or University of Sheffield computing cluster, with minimal setup.

It will be assumed that you have some familiarity with the Unix command-line interface, and know what commands you want to run. You can get a primer from the Software Carpentry organisation for example.

The particular example we will give is for an RNA-seq analysis. If you have a different analysis to perform, please check-out the RNA-seq instructions before seeing what seeing what other software we have made available.

Disclaimer:- this is not a tutorial on how to perform an RNA-seq analysis and which tools to use. You should use this tutorial when you are familiar with the workflow and want to run the tools on your own data.

Video walkthrough

Setup on your own machine

A Virtual-Machine approach (e.g. using VirtualBox) could be used, but we will consider a solution using “Docker”.

Docker is an open platform for developers to build and ship applications, whether on laptops, servers in a data center, or the cloud. It is a (relatively) painless way for you to install and try out Bioinformatics software. You can think of it as an isolated environment inside your existing operating system where you can install and run software without messing with the main OS.

It is worth bearing in mind that the first approach will use the CPU and RAM from your own machine, so if you do not have adequate resources, some of the analyses may struggle and you might have to consider using a computing cluster.

Installing Docker

Choose the appropriate link below to install docker on your machine

Windows

Once you have installed Docker using the instructions above, you can open a terminal (Mac) or command prompt (Windows; search for the CMD program) and type the following to check that everything is working

docker run hello-world

Using the environment to analyse your own data

hello-world is a pre-built container that prints a “Hello world” message to the screen. Many popular software (not just Bioinformatics) and pipelines are distributed using docker and in particular the dockerhub website. Our container for RNA-seq analysis is available at sheffieldbioinformatics/rnaseq-training.

With the default settings, the “container” is isolated from your own machine; we can neither bring files that we create back to our own OS, or analyse our own data.

However, adding an -v argument allows certain folders on your own OS to be visible within the environment.

Lets assume the files I want to analyse are to be found in the folder /c/work/my_fastq_data. Here’s what the files look like on my Windows machine

The following command would map that directory to the folder /data inside the docker container

docker run --rm  -it -v /c/work/my_fastq_data:/data sheffieldbioinformatics/rnaseq-training

Notice how the command prompt changes to indicate that I am now the root user within a different file system

N.B. the other options being used here are –rm to delete the container afterwards, and -it to make it interactive.

We now should be able to see our files with the ls command on the directory /data/.

cd /data/
ls

If I now want to run fastqc to perform a QC check on my files, the fastqc tool is available to us.

cd /data
fastqc *.fastq.gz

Conveniently the results are appear in the directory on our Windows machine