Tag Archives: MBD-Seq

Data Management – Concatenate FASTQ files from Oly MBDseq Project

Steven requested I concatenate the MBDseq files we received for this project:

  • concatenate the s4, s5, s6 file sets for each individual

  • concatenate the full file sets for each individual

Ran the concatenations in the Jupyter (iPython) notebook below. All files were saved to Owl/nightingales/O_lurida/2016

Jupyter Notebook: 20160411_Concatenate_Oly_MBDseq.ipynb

NBviewer: 20160411_Concatenate_Oly_MBDseq

Share

Data Received – Ostrea lurida MBD-enriched BS-seq

Received the Olympia oyster, MBD-enriched BS-seq sequencing files (50bp, single read) from ZymoResearch (submitted 20151208). Here’s the sample list:

  • E1_hc1_2B
  • E1_hc1_4B
  • E1_hc2_15B
  • E1_hc2_17
  • E1_hc3_1
  • E1_hc3_5
  • E1_hc3_7
  • E1_hc3_10
  • E1_hc3_11
  • E1_ss2_9B
  • E1_ss2_14B
  • E1_ss2_18B
  • E1_ss3_3B
  • E1_ss3_14B
  • E1_ss3_15B
  • E1_ss3_16B
  • E1_ss3_20
  • E1_ss5_18

 

The 18 samples listed above had previously been MBD-enriched and then sent to ZymoResearch for bisulfite conversion, multiplex library construction, and subsequent sequencing. The library (multiplex of all samples) was sequenced in a single lane, three times. Thus, we would expect 54 FASTQ files. However, ZymoResearch was dissatisfied with the QC of the initial sequencing run (completed on 20160129), so they re-ran the samples (completed on 20160202). This created two sets of data, resulting in a total of 108 FASTQ files.

ZymoResearch data portal does not allow bulk download of files. However, I ended up using Chrono Download Manager extension for Google Chrome to allow for automated downloading of each file (per ZymoResearch recommendation).

After download, the files were moved to their permanent storage location on Owl: http://owl.fish.washington.edu/nightingales/O_lurida/20160203_mbdseq

The readme.md file was updated to include project/file information.

The file manipulations were performed in a Jupyter notebook (see below).

 

Total reads generated for this project: 1,481,836,875

 

Jupyter Notebook file: 20160203_Olurida_Zymo_Data_Handling.ipynb

Notebook Viewer: 20160203_Olurida_Zymo_Data_Handling.ipynb

Share

DNA Quantification – MBD-enriched Olympia oyster DNA

Quantified the MBD enriched samples prepped over the last two days: MBD enrichment, EtOH precipiation.

Samples were quantified using the QuantIT dsDNA BR Kit (Invitrogen) according to the manufacturer’s protocol.

Standards were run in triplicate, samples were run in duplicate.

96-well black (opaque) plate was used.

Fluorescence was measured on the Seeb Lab’s Victor 1420 plate reader (Perkin Elmer).

Results:

Google Sheet: 20151123_MBD_libraries_quantification

Standard curve looked good – R² = 0.999

MBD recovery ranged from ~250 – 600ng.

MBD percent recoveries ranged from ~2 – 20%. Input DNA quantities were taken from Katherine’s numbers (Google Sheet): Silliman-DNA-Samples

Will contact services about getting bisulfite Illumina sequencing performed.

Share

Quality Trimming – LSU C.virginica Oil Spill MBD BS-Seq Data

Jupyter (IPython) Notebook: 20150414_C_virginica_LSU_Oil_Spill_Trimmomatic_FASTQC.ipynb

NBviewer: 20150414_C_virginica_LSU_Oil_Spill_Trimmomatic_FASTQC.ipynb

Trimmed FASTQC

NB3 No oil Index – ACAGTG

20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001_fastqc.html
20150414_trimmed_2112_lane1_ACAGTG_L001_R1_002_fastqc.html

NB6 No oil Index – GCCAAT

20150414_trimmed_2112_lane1_GCCAAT_L001_R1_001_fastqc.html
20150414_trimmed_2112_lane1_GCCAAT_L001_R1_002_fastqc.html

NB11 No oil Index – CAGATC

20150414_trimmed_2112_lane1_CAGATC_L001_R1_001_fastqc.html
20150414_trimmed_2112_lane1_CAGATC_L001_R1_002_fastqc.html
20150414_trimmed_2112_lane1_CAGATC_L001_R1_003_fastqc.html

HB2 25,000ppm oil Index – ATCACG

20150414_trimmed_2112_lane1_ATCACG_L001_R1_001_fastqc.html
20150414_trimmed_2112_lane1_ATCACG_L001_R1_002_fastqc.html
20150414_trimmed_2112_lane1_ATCACG_L001_R1_003_fastqc.html

HB16 25,000ppm oil Index – TTAGGC

20150414_trimmed_2112_lane1_TTAGGC_L001_R1_001_fastqc.html
20150414_trimmed_2112_lane1_TTAGGC_L001_R1_002_fastqc.html

HB30 25,000ppm oil Index – TGACCA

20150414_trimmed_2112_lane1_TGACCA_L001_R1_001_fastqc.html

Share

Sequence Data Analysis – LSU C.virginica Oil Spill MBD BS-Seq Data

Performed some rudimentary data analysis on the new, demultiplexed data downloaded earlier today:

2112_lane1_ACAGTG_L001_R1_001.fastq.gz
2112_lane1_ACAGTG_L001_R1_002.fastq.gz
2112_lane1_ATCACG_L001_R1_001.fastq.gz
2112_lane1_ATCACG_L001_R1_002.fastq.gz
2112_lane1_ATCACG_L001_R1_003.fastq.gz
2112_lane1_CAGATC_L001_R1_001.fastq.gz
2112_lane1_CAGATC_L001_R1_002.fastq.gz
2112_lane1_CAGATC_L001_R1_003.fastq.gz
2112_lane1_GCCAAT_L001_R1_001.fastq.gz
2112_lane1_GCCAAT_L001_R1_002.fastq.gz
2112_lane1_TGACCA_L001_R1_001.fastq.gz
2112_lane1_TTAGGC_L001_R1_001.fastq.gz
2112_lane1_TTAGGC_L001_R1_002.fastq.gz

 

Compared total amount of data (in gigabytes) generated from each index. The commands below send the output of the ‘ls -l’ command to awk. Awk sums the file sizes, found in the 5th field ($5) of the ‘ls -l’ command, then prints the sum, divided by 1024^3 to convert from bytes to gigabytes.

Index: ACAGTG

$ls -l 2112_lane1_AC* | awk '{sum += $5} END {print sum/1024/1024/1024}'
1.49652

 

Index: ATCACG

$ls -l 2112_lane1_AT* | awk '{sum += $5} END {print sum/1024/1024/1024}'
3.02269

 

Index: CAGATC

$ls -l 2112_lane1_CA* | awk '{sum += $5} END {print sum/1024/1024/1024}'
3.49797

 

Index: GCCAAT

$ls -l 2112_lane1_GC* | awk '{sum += $5} END {print sum/1024/1024/1024}'
2.21379

 

Index: TGACCA

$ls -l 2112_lane1_TG* | awk '{sum += $5} END {print sum/1024/1024/1024}'
0.687374

 

Index: TTAGGC

$ls -l 2112_lane1_TT* | awk '{sum += $5} END {print sum/1024/1024/1024}'
2.28902

 

Ran FASTQC on the following files downloaded earlier today. The FASTQC command is below. This command runs FASTQC in a for loop over any files that begin with “2212_lane2_C” or “2212_lane2_G” and outputs the analyses to the Arabidopsis folder on Eagle:

$for file in /Volumes/nightingales/C_virginica/2112_lane1_[ATCG]*; do fastqc "$file" --outdir=/Volumes/Eagle/Arabidopsis/; done

 

From within the Eagle/Arabidopsis folder, I renamed the FASTQC output files to prepend today’s date:

$for file in 2112_lane1_[ATCG]*; do mv "$file" "20150413_$file"; done

 

Then, I unzipped the .zip files generated by FASTQC in order to have access to the images, to eliminate the need for screen shots for display in this notebook entry:

$for file in 20150413_2112_lane1_[ATCG]*.zip; do unzip "$file"; done

 

The unzip output retained the old naming scheme, so I renamed the unzipped folders:

$for file in 2112_lane1_[ATCG]*; do mv "$file" "20150413_$file"; done

 

The FASTQC results are linked below:

20150413_2112_lane1_ACAGTG_L001_R1_001_fastqc.html
20150413_2112_lane1_ACAGTG_L001_R1_002_fastqc.html
20150413_2112_lane1_ATCACG_L001_R1_001_fastqc.html
20150413_2112_lane1_ATCACG_L001_R1_002_fastqc.html
20150413_2112_lane1_ATCACG_L001_R1_003_fastqc.html
20150413_2112_lane1_CAGATC_L001_R1_001_fastqc.html
20150413_2112_lane1_CAGATC_L001_R1_002_fastqc.html
20150413_2112_lane1_CAGATC_L001_R1_003_fastqc.html
20150413_2112_lane1_GCCAAT_L001_R1_001_fastqc.html
20150413_2112_lane1_GCCAAT_L001_R1_002_fastqc.html
20150413_2112_lane1_TGACCA_L001_R1_001_fastqc.html
20150413_2112_lane1_TTAGGC_L001_R1_001_fastqc.html
20150413_2112_lane1_TTAGGC_L001_R1_002_fastqc.html

 

Share

Sequence Data – LSU C.virginica Oil Spill MBD BS-Seq Demultiplexed

I had previously contacted Doug Turnbull at the Univ. of Oregon Genomics Core Facility for help demultiplexing this data, as it was initially returned to us as a single data set with “no index” (i.e. barcode) set for any of the libraries that were sequenced. As it turns out, when multiplexed libraries are sequenced using the Illumina platform, an index read step needs to be “enabled” on the machine for sequencing. Otherwise, the machine does not perform the index read step (since it wouldn’t be necessary for a single library). Surprisingly, the sample submission form for the Univ. of Oregon Genomics Core Facility  doesn’t request any information regarding whether or not a submitted sample has been multiplexed. However, by default, they enable the index read step on all sequencing runs. I provided them with the barcodes and they demultiplexed them after the fact.

I downloaded the new, demultiplexed files to Owl/nightingales/C_virginica:

lane1_ACAGTG_L001_R1_001.fastq.gz
lane1_ACAGTG_L001_R1_002.fastq.gz
lane1_ATCACG_L001_R1_001.fastq.gz
lane1_ATCACG_L001_R1_002.fastq.gz
lane1_ATCACG_L001_R1_003.fastq.gz
lane1_CAGATC_L001_R1_001.fastq.gz
lane1_CAGATC_L001_R1_002.fastq.gz
lane1_CAGATC_L001_R1_003.fastq.gz
lane1_GCCAAT_L001_R1_001.fastq.gz
lane1_GCCAAT_L001_R1_002.fastq.gz
lane1_TGACCA_L001_R1_001.fastq.gz
lane1_TTAGGC_L001_R1_001.fastq.gz
lane1_TTAGGC_L001_R1_002.fastq.gz

Notice that the file names now contain the corresponding index!

Renamed the files, to append the order number to the beginning of the file names:

$for file in lane1*; do mv "$file" "2112_$file"; done

New file names:

2112_lane1_ACAGTG_L001_R1_001.fastq.gz
2112_lane1_ACAGTG_L001_R1_002.fastq.gz
2112_lane1_ATCACG_L001_R1_001.fastq.gz
2112_lane1_ATCACG_L001_R1_002.fastq.gz
2112_lane1_ATCACG_L001_R1_003.fastq.gz
2112_lane1_CAGATC_L001_R1_001.fastq.gz
2112_lane1_CAGATC_L001_R1_002.fastq.gz
2112_lane1_CAGATC_L001_R1_003.fastq.gz
2112_lane1_GCCAAT_L001_R1_001.fastq.gz
2112_lane1_GCCAAT_L001_R1_002.fastq.gz
2112_lane1_TGACCA_L001_R1_001.fastq.gz
2112_lane1_TTAGGC_L001_R1_001.fastq.gz
2112_lane1_TTAGGC_L001_R1_002.fastq.gz

Updated the checksums.md5 file to include the new files (the command is written to exclude the previously downloaded files that are named “2112_lane1_NoIndex_”; the [^N] regex excludes any files that have a capital ‘N’ at that position in the file name):

$for file in 2112_lane1_[^N]*; do md5 "$file" >> checksums.md5; done

Updated the readme.md file to reflect the addition of these new files.

 

Share

Bisulfite NGS Library – LSU C.virginica Oil Spill MBD Bisulfite DNA Sequencing Submission

Combined the following libraries in equal quantities (17ng each) to create a single, multiplexed sample for sequencing (LSU_Oil_01):

  • HB2 – 1 (ATCACG)
  • HB16 – 3 (TTAGGC)
  • HB30 – 4 (TGACCA)
  • NB3 – 5 (ACAGTG)
  • NB6 – 6 (GCCAAT)
  • NB11 – 7 (CAGATC)

Quantified pooled libraries using the Quant-iT dsDNA BR Kit (Invitrogen) with a FLx800 plate reader (BioTek). Used 1μL of the pooled sample, run in duplicate. Used 1uL of standards, run in duplicate.

Results:

pooled libraries = 6.575ng/μL

Will submit to University of Oregon Genomics Core Facility for 100bp, single end Illumina HiSeq2500 sequencing. They need 10nM of sample. For a library with average size range of 300-400bp, this requires a sample volume of 20uL with a concentration of 2.28ng/μL in a solution of 0.1% Tween20 in Buffer EB (Qiagen).

Combined 6.94μL of pooled libraries with 13.06 of 0.1% Tween20/EB solution.

Submitted sample LSU_Oil_01 to University of Oregon Genomics Core Facility via O/N FedEx on dry ice. Sample was assigned order # 2112.

Share

Bisulfite NGS Library Prep – LSU C.virginica Oil Spill Bisulfite DNA and Emma’s C.gigas Larvae OA Bisulfite DNA

Constructed next generation libraries (Illumina) using the bisulfite-treated DNA from yesterday using the EpiNext Post-Bisulfite DNA Library Preparation Kit – Illumina (Epigentek). Samples were processed according to the manufacturer’s protocol up to Section 8 (Library Amplification) with the following changes:

– Skipped Section 7.1 (recommended to do so in the protocol due to low quantity of input DNA)

Samples were stored O/N @ -20C.

dA Tailing Master Mix

10x Tailing Buffer 1.5uL x 17.6 = 26.4uL

Klenow 1uL x 17.6 = 17.6uL

H2O 0.5uL x 17.6 = 8.8uL

Add 3uL of master mix to each sample

Adaptor Ligation

2x Ligation Buffer 17uL x 17.6 – 299.2uL

T4 DNA Ligase 1uL x 17.6uL = 17.6uL

Adaptors 1uL x 17.6 = 17.6uL

Added 19uL of master mix to each sample

dsDNA Conversion Master Mix

5x Conversion Buffer 4uL x 17.6 = 70.4uL

C.P. 2uL x 17.6 = 35.2uL

H2O 3uL x 18.6 = 52.8uL

Add 9uL of master mix to each sample

End Repair

10x Buffer 2uL x 17.6 = 35.2uL

Enzyme 1uL x 17.6 = 17.6uL

H2O 5uL x 17.6 = 88uL

Added 8uL of master mix to each sample

Share

Bisulfite Conversion – LSU C.virginica Oil Spill MBD DNA and Emma’s C.gigas Larvae OA DNA

Performed bisulfite conversion on MBD DNA samples from LSU C.virginica oil spill samples (see 201411202 and 20141126) and Emma’s C.gigas larvae OA DNA samples (see 20141121) with the Methylamp DNA Modification Kit (Epigentek).

Added 4uL of H2O to each of Emma’s DNA samples to bring them up to 24uL.

Samples were processed according to the manufacturer’s protocol.

Samples were eluted with 10uL of Solution R6 and stored @ -20C.

Share