Tag Archives: reproducibility

Docker – Improving Roberts Lab Reproducibility

In an attempt at furthering our lab’s abilities to maximize our reproducibility, I’ve beenĀ  working on developing an all-encompassing Docker image. Docker is a type of virtual machine (i.e. a self-contained computer that runs within your computer). For the Roberts Lab, the advantage of using Docker is that the Docker images can be customized to run a specific suite of software and these images can then be used by any other person in the lab (assuming they can run Docker on their particular operating system), regardless of operating system. In turn, if everyone is using the same Docker image (i.e. the same virtual machine with all the same software), then we should be able to reproduce data analyses more reliably, due to the fact that there won’t be differences between software versions that people are using. Additionally, using Docker greatly simplifies the setup of new computers with the requisite software.

I’ve put together a Dockerfile (a Dockerfile is a text file/script that Docker uses to retrieve software and build a computer image with those specific instructions) which will automatically build a Docker image (i.e. virtual computer) that contains all of the normal bioinformatics software our lab uses. This has been a side project while I wait for Stacks analysis to complete (or, fail, depending on the day) and it’s finally usable! The image that is built from this Dockerfile will even let the user run R Studio and/or Jupyter Notebooks in their browser (I’m excited about this part)!

Here’s the current list of software that will be installed:

bedtools 2.25.0
bismark 0.15.0
blast 2.3.0+
bowtie2 2.2.8
bsmap 2.90
cufflinks 2.1.1
fastqc 0.11.5
fastx_toolkit 0.0.13
R 3.2.5
RStudio Server0.99
pyrad 3.0.66
samtools 0.1.19
stacks 1.40
tophat 2.1.1
trimmomatic 0.36

In order to set this up, you need to install Docker and download the Dockerfile (Dockerfile.bio) I’ve created.

I’ve written a bit of a user guide (specific to this Dockerfile) here to get people started: docker.md

The user guide explains a bit how all of this works and tries to progress from a “basic” this-is-how-to-get-started-with-Docker to an “advanced” description of how to map ports, mount local volumes in your containers, and how to start/attach previously used containers.

The next major goal I have with this Docker project is to get the R kernel installed for Jupyter Notebooks. Currently, the Jupyter Notebook installation is restricted to the default Python 2 kernel.

Additionally, I’d like to improve the usability of the Docker image by setting up aliases in the image. Meaning, a user who wants to use the bowtie program can just type “bowtie”. Currently, the user has to type “bowtie2_2.2.8″ (although, with this being in the system PATH and tab-completion, it’s not that big of a deal), which is a bit ugly.

For some next level stuff, I’d also like to setup all Roberts Lab computers to automatically launch the Docker image when the user opens a terminal. This would greatly simplify things for new lab members. They wouldn’t have to deal with going through the various Docker commands to start a Docker container. Instead, their terminal would just put them directly into the container and the user would be none-the-wiser. They’d be reproducibly conducting data analysis without even having to think about it.

Share

qPCR – Repeat of Earlier qPCR with New Sample

Re-ran the qPCR (see earlier entry from today), due to the Low Feces samples failing to amplify. Replaced it with:

R3E 5/14/09 – 10^0

Everything else (including master mix, cycling params, etc) is all the same. See the earlier entry for details.

Results:

qPCR Data File (CFX96): Sam_2012-10-22 13-37-42_CC009827.pcrd

qPCR Report (PDF): Sam_2012-10-22 13-37-42_CC009827.pdf

Things looked good. The replacement Low Feces sample amplified.

Share

qPCR – Withering Syndrome qPCR Assay Validation: Reproducibility (CFX 96)

Ran qPCR for the reproducibility aspect of the WSN qPCR Assay Validation. Master mix calcs are here. Plate layout, cycling params, etc. can be found in the Results (see below).

Standard curve was the p16RK7 NcoI-linearized curve made on 20120730.

Baseline threshold was set to 400 and cycles to analyze was set to 41.

Samples used for “low”, “medium” and “high” copy numbers for each sample type are below, with expected fold copy number (based off of previous qPCRs):

Feces

Low: R4E 4/17/09 – 10^0

Med: R3E 7/23/09 – 10^3

High: R4E 7/23/09 – 10^4

Tissue

Low: 09:16-18 – 10^1

Med: 09:16-22 – 10^2

High: 09:20-11 – 10^5

Water

Low: 494:11-11 – 10^0

Med: 494-11-12 – 10^2

High: TAF SD A2 – 10^3

Results:

qPCR Data File (CFX96): Sam_2012-10-22 16-16-19_CC009827.pcrd

qPCR Report (PDF): Sam_2012-10-22 16-16-19_CC009827.pdf

Everything looked good except for the Low Feces sample which didn’t produce any amplification. Will identify another sample to use for the Low Feces sample.

Share