Previously downloaded Jay’s epiRADseq data that was provided by the Genomic Sequencing Laboratory at UC-Berkeley. It was provided already demultiplexed (which is very nice of them!). To be completionists on our end, we requested the non-demultiplexed data set.

time wget -r -np -nc --ask-password ftp://gslftp@gslserver.qb3.berkeley.edu/160830_100PE_HS4KB_L4

It took awhile:

FINISHED --2016-09-19 11:39:21--
Total wall clock time: 4h 26m 21s
Downloaded: 11 files, 36G in 4h 17m 18s (2.39 MB/s)

Here are the files:

• JD001_A_S1_L004_R2_001.fastq.gz
• JD001_A_S1_L004_R1_001.fastq.gz
• JD001_A_S1_L004_I1_001.fastq.gz
• 160830_100PE_HS4KB_L4_Stats/
• ConversionStats.xml
• DemultiplexingStats.xml
• DemuxSummaryF1L4.txt
• FastqSummaryF1L4.txt

Generated MD5 checksums for each file:

for i in *.gz; do md5 $i >> checksums.md5; done Calculate total number of reads for this sequencing run: totalreads=0; for i in *S1*R*.gz; do linecount=gunzip -c "$i" | wc -l; readcount=$((linecount/4)); totalreads=$((readcount+totalreads)); done; echo $totalreads  Total reads: 662,868,166 (this isn’t entirely accurate, as it is counting all three files; probably should’ve just counted the R1 and R2 files…) Calculate read counts for each file and write the data to the readme.md file in the Owl/web/nightingales/Porites_spp directory: for i in *S1*R*.gz; do linecount=gunzip -c "$i" | wc -l; readcount=$(($linecount/4)); printf "%st%sn" "$i" "$readcount" >> readme.md; done

See this Jupyter notebook for code explanations.

We received notice that Jay’s coral (Porites spp) epiRADseq data was available from the Genomic Sequencing Laboratory at UC-Berkeley.

time wget -r -np -nc -A "*.gz" --ask-password ftp://gslftp@gslserver.qb3.berkeley.edu/160830_100PE_HS4KB/Roberts

Generated MD5 checksums for each file:

for i in *.gz; do md5 $i >> checksums.md5; done Calculate total number of reads for this sequencing run: totalreads=0; for i in *.gz; do linecount=gunzip -c "$i" | wc -l; readcount=$((linecount/4)); totalreads=$((readcount+totalreads)); done; echo $totalreads Total reads: 573,378,864 Calculate read counts for each file and write the data to the readme.md file in the Owl/web/nightingales/Porites_spp directory: for i in *.gz; do linecount=gunzip -c "$i" | wc -l; readcount=$(($linecount/4)); printf "%st%sn" "$i" "$readcount" >> readme.md; done

See this Jupyter notebook for code explanations.

# Data Management – O.lurida 2bRAD Dec2015 Undetermined FASTQ files

An astute observation by Katherine Silliman revealed that the FASTQ files I had moved to our high-throughput sequencing archive on our server Owl, only had two FASTQ files labeled as “undetermined”. Based on the number of lanes we had sequenced, we should have had many more. Turns out that the undetermined FASTQ files that were present in different sub-folders of the Genewiz project data were not uniquely named. Thus, when I moved them over (via a bash script), the undetermined files were continually overwritten, until we were left with only two FASTQ files labeled as undetermined.

So, I re-downloaded the entire project folder from Genewiz servers and renamed the FASTQ files labeled as undetermined and then copied them over to the archive on Owl:

I also zipped up the raw data project from Genewiz and moved it to the same archive location and updated the checksum.md5 and readme.md files.

Details can be found in the Jupyter (iPython) notebook below.

# Since you’ve been gone

Soon after Ensenada I went to Chili, SICB, and PAG (in that order). The new year is often of time to let go of lingering projects, and likely I will be doing that soon. But to bring a few pending efforts to the forefront, so that I can analyze etc here is a bit of data that is (or soon will) be coming in.
Much of this is centered around the Ostrea lurida.

The first batch was 2bRAD data.

The full list of samples are here.

These raw data are here.

A quick fastqc….

We also now have a fresh set of MBD-BS.. now out for sequencing.
Pregame here

And just some plain old BS

Details

The full list of samples (and the individual samples/libraries/indexes) submitted to Genewiz for this project by Katherine Silliman & me can be seen here (Google Sheet): White_BS1511196_R2_barcodes

The data supplied were all of the Illumina output files (currently not entirely sure where/how we want to store all of this, but we’ll probably want to use them for attempting our own demultiplexing since there were a significant amount of reads that Genewiz was unable to demultiplex), in addition to demultiplexed FASTQ files. The FASTQ files were buried in inconvenient locations, and there are over 300 of them, so I used the power of the command line to find them and copy them to a single location: http://owl.fish.washington.edu/nightingales/O_lurida/2bRAD_Dec2015/

Find and copy all FASTQ files:

find /run/user/1000/gvfs/smb-share:server=owl.fish.washington.edu,share=home/ -name '*.fastq.*' -exec cp -n '{}' /run/user/1000/gvfs/smb-share:server=owl.fish.washington.edu,share=web/nightingales/O_lurida/ ;

Code explanation:

find
• Command line program used for searching for files
/run/user/1000/gvfs/smb-share:server=owl.fish.washington.edu,share=home/ 
• Location of the files I wanted to search through. The path looks a little crazy because I was working remotely and had the server share mounted.
-name '*.fastq.*'
• The name argument tells the find command to look for filenames that have “.fastq” in them.
-exec cp -n '{}'
• The exec option tells the find command to execute a subsequent action upon finding a match. In this case, I’m using the copy command (cp) and telling the program not to overwrite (clobber, -n) any duplicate files.
/run/user/1000/gvfs/smb-share:server=owl.fish.washington.edu,share=web/nightingales/O_lurida/2bRAD_Dec2015 ;
• The location where I want the matched files copied.

I created a readme file in the directory directory with these files: readme.md

I wanted to add some information about the project to the readme file, like total number of sequencing reads generated and the number of reads in each FASTQ file.

Here’s how to count the total of all reads generated in this project

totalreads=0; for i in *.gz; do linecount=gunzip -c "$i" | wc -l; readcount=$((linecount/4)); totalreads=$((readcount+totalreads)); done; echo$totalreads

Code explanation:

totalreads=0;
• Creates variable called “totalreads” and initializes value to 0.
for i in *.gz;
• Initiates a for loop to process any filenames that end with “.gz”. The FASTQ files have been compressed with gzip and end with the .gz extension.
do linecount=
• Creates variable called “linecount” that stores the results of the following command:
gunzip -c "$i" | wc -l; • Unzips the files ($i) to stdout (-c) instead of actually uncompressing them. This is piped to the word count command, with the line flag (wc -l) to count the number of lines in the files.
readcount=$((linecount/4)); • Divides the value stored in linecount by 4. This is because an entry for a single Illumina read comprises four lines. This value is stored in the “readcount” variable. totalreads=$((readcount+totalreads));
done;
• End the for loop.
echo $totalreads • Prints the value of totalreads to the screen. Next, I wanted to generate list of the FASTQ files and corresponding read counts, and append this information to the readme file. for i in *.gz; do linecount=gunzip -c "$i" | wc -l; readcount=$(($linecount/4)); printf "%st%sn%sttn" "$i" "$readcount" >> readme.md; done

Code explanation:

for i in *.gz; do linecount=gunzip -c "$i" | wc -l; readcount=$(($linecount/4)); • Same for loop as above that calculates the number of reads in each FASTQ file. printf "%st%snn" "$i" "$readcount" >> readme.md; • This formats the the printed output. The “%st%snn” portion prints the value in$i as a string (%s), followed by a tab (t), followed by the value in \$readcount as a string (%s), followed by two consecutive newlines (nn) to provide an empty line between the entries. See the readme file linked above to see how the output looks.
>> readme.md; done
• This appends the result from each loop to the readme.md file and ends the for loop (done).

# qPCR – Oly RAD-Seq Library Quantification

After yesterday’s attempt at quantification revealed insufficient dilution of the libraries, I repeated the qPCRs using 1:100000 dilutions of each of the libraries. Used the KAPA Illumina Quantification Kit (KAPA Biosystems) according to the manufacturer’s protocol.

Ran all samples, including standards, in triplicate on the Roberts Lab Opticon2 (BioRad).

Plate set up and master mix can be found here: 20151117_qPCR_plate_layout_Oly_RAD.JPG

Results:

Overall, the new dilutions worked well, with all the library samples coming up between Ct 9 – 15, which is well within the range of the standard curve.

Manually adjusted the baseline threshold to be above any background fluorescence (see images below).

All samples, except Oly RAD 30, exhibit two peaks in the melt curve indicating contaminating primer dimers. Additionally, the peak heights appear to be roughly equivalent. Can we use this fact to effectively “halve” the concentration of our sample to make a rough estimate of library-only PCR products?

Here are the calculated library concentrations, based on the KAPA Biosystems formulas

 Library Library Stock Conc. (nM) Stock Halved (nM) Oly RAD 02 46.70 23.35 Oly RAD 03 79.35 39.67 Oly RAD 04 61.35 30.67 Oly RAD 06 30.61 15.30 Oly RAD 07 477.05 238.53 Oly RAD 08 46.32 23.16 Oly RAD 14 224.91 112.46 Oly RAD 17 24.56 12.28 Oly RAD 23 49.56 24.78 Oly RAD 30 11.19 NA

Amplification plots of standard curve samples:

Melt curve plots of standard curve samples. Shows expected “shoulder” to the left of the primary peak:

Amplification plots of RAD library samples:

Melt curve plots of RAD library samples. Peak on the right corresponds to primer dimer. Peak heights between primer dimer and desired PCR product are nearly equivalent for each respective sample, suggesting that each product is contributing equally to the fluorescence generated in the reactions:

Melt curve plot of Oly RAD library 30. Notice there’s only a single peak due to the lack of primer dimers in this sample:

# qPCR – Oly RAD-Seq Library Quantification

The final step before sequencing these 2bRAD libraries is to quantify them. Used the KAPA Illumina Quantification Kit (KAPA Biosystems) according to the manufacturer’s protocol.

Made 1:4 dilutions of each library to use as template.

Ran all samples, including standards, in triplicate on the Roberts Lab Opticon2 (BioRad).

Plate set up and master mix can be found here: 20151116_qPCR_plate_layout_Oly_RAD.JPG

Results:

The take home messages from this qPCR are this:

• The amplification plots that are pushed up against the left side of the graph (essentially at ~ cycle 1) are all of the libraries. A 1:4 dilution was insufficient to have the libraries amplify within the range of the standard curve.
• All libraries except one (Oly RAD Library 30) have detectable levels of primer dimer. This confounds library quantification (because both the intended PCR product and the primer dimers contribute to the fluorescence accumulation), as well as potentially interfering with the subsequent Illumina sequencing (primer dimers will be sequenced and contain no insert sequence).

Will repeat the qPCR with more appropriately diluted libraries.

See the info below for more deets on this run.

Default analysis settings need to be adjusted to account for how early the standard curve comes up. Otherwise, the Opticon software sets the baseline incorrectly:

The KAPA Quantification Kit indicates that the baseline calculations need to be extended to cycles 1 through 3. This allows the software to set the baseline threshold correctly:

Melt curve analysis of the standard curve shows the expected profile – slight hump leading into the peak:

Melt curve analysis of the libraries. Dual peaks indicate primer dimer contamination:

Melt curve analysis of Oly RAD Library 30. Shows the desired single peak, suggesting library is free of primer dimers:

# Gel Extraction – Oly RAD-Seq Prep Scale PCR

Extracted the PCR products from the gel slices from 20151113 using the QIAQuick Gel Extraction Kit (Qiagen) according to the manufacturer’s protocol. Substituted MiniElute columns so that I could elute with a smaller volume than what is used in the QIAQuick standard protocol.

Samples were eluted with 20μL of Buffer EB.

Will quantify these libraries via qPCR.

# PCR – Oly RAD-seq Prep Scale PCR

Continuing with the RAD-seq library prep. Following the Meyer Lab 2bRAD protocol.
After determining the minimum number of PCR cycles to run to generate a visible, 166bp band on a gel yesterday, ran a full library “prep scale” PCR.

 REAGENT SINGLE REACTION (μL) x11 Template 40 NA ILL-HT1 (1μM) 5 55 ILL-BC# (1μM) 5 NA NanoPure H2O 5 55 dNTPs (1mM) 20 220 ILL-LIB1 (10μM) 2 22 ILL-LIB2 (10μM) 2 22 5x Q5 Reaction Buffer 20 220 Q5 DNA Polymerase 1 11 TOTAL 100 550

Combined the following for PCR reactions:

• 55μL PCR master mix
• 40μL ligation mix
• 5μL of ILL-BC# (1μM) – The barcode number and the respective sample are listed below.

Cycling was performed on a PTC-200 (MJ Research) with a heated lid:

 STEP TEMP (C) TIME (s) Initial Denaturation 98 30 17 cycles 98 60 72 5 20 10