Tag Archives: OA

Sequence Data Analysis – C.gigas Larvae OA BS-Seq Data

Compared total amount of data generated from each index. The commands below send the output of the ‘ls -l’ command to awk. Awk sums the file sizes, found in the 5th field ($5) of the ‘ls -l’ command, then prints the sum, divided by 1024^3 to convert from bytes to gigabytes.

Index: CTTGTA

$ ls -l 2212_lane2_[C]* | awk '{sum += $5} END {print sum/1024/1024/1024}'
5.33341

Index: GCCAAT
$ ls -l 2212_lane2_[G]* | awk '{sum += $5} END {print sum/1024/1024/1024}'
7.00596

There’s ~1.4x data in the GCCAAT files.

 

Ran FASTQC on the following files downloaded earlier today:

2212_lane2_CTTGTA_L002_R1_001.fastq.gz
2212_lane2_CTTGTA_L002_R1_002.fastq.gz
2212_lane2_CTTGTA_L002_R1_003.fastq.gz
2212_lane2_CTTGTA_L002_R1_004.fastq.gz
2212_lane2_GCCAAT_L002_R1_001.fastq.gz
2212_lane2_GCCAAT_L002_R1_002.fastq.gz
2212_lane2_GCCAAT_L002_R1_003.fastq.gz
2212_lane2_GCCAAT_L002_R1_004.fastq.gz
2212_lane2_GCCAAT_L002_R1_005.fastq.gz
2212_lane2_GCCAAT_L002_R1_006.fastq.gz

 

The FASTQC command is below. This command runs FASTQC in a for loop over any files that begin with “2212_lane2_C” or “2212_lane2_G” and outputs the analyses to the Arabidopsis folder on Eagle:

$for file in /Volumes/nightingales/C_gigas/2212_lane2_[CG]*; do fastqc "$file" --outdir=/Volumes/Eagle/Arabidopsis/; done

 

From within the Eagle/Arabidopsis folder, I renamed the FASTQC output files to prepend today’s date:

$for file in 2212_lane2_[GC]*; do mv "$file" "20150413_$file"; done

 

Then, I unzipped the .zip files generated by FASTQC in order to have access to the images, to eliminate the need for screen shots for display in this notebook entry:

$for file in 20150413_2212_lane2_[CG]*.zip; do unzip "$file"; done

 

The unzip output retained the old naming scheme, so I renamed the unzipped folders:

$for file in 2212_lane2_[GC]*; do mv “$file” “20150413_$file”; done

 

The FASTQC results are linked below:

20150413_2212_lane2_CTTGTA_L002_R1_001_fastqc.html

20150413_2212_lane2_CTTGTA_L002_R1_002_fastqc.html
20150413_2212_lane2_CTTGTA_L002_R1_003_fastqc.html
20150413_2212_lane2_CTTGTA_L002_R1_004_fastqc.html
20150413_2212_lane2_GCCAAT_L002_R1_001_fastqc.html
20150413_2212_lane2_GCCAAT_L002_R1_002_fastqc.html
20150413_2212_lane2_GCCAAT_L002_R1_003_fastqc.html
20150413_2212_lane2_GCCAAT_L002_R1_004_fastqc.html
20150413_2212_lane2_GCCAAT_L002_R1_005_fastqc.html
20150413_2212_lane2_GCCAAT_L002_R1_006_fastqc.html

 

Share

Sequence Data – C.gigas OA Larvae BS-Seq Demultiplexed

I had previously contacted Doug Turnbull at the Univ. of Oregon Genomics Core Facility for help demultiplexing this data, as it was initially returned to us as a single data set with “no index” (i.e. barcode) set for any of the libraries that were sequenced. As it turns out, when multiplexed libraries are sequenced using the Illumina platform, an index read step needs to be “enabled” on the machine for sequencing. Otherwise, the machine does not perform the index read step (since it wouldn’t be necessary for a single library). Surprisingly, the sample submission form for the Univ. of Oregon Genomics Core Facility  doesn’t request any information regarding whether or not a submitted sample has been multiplexed. However, by default, they enable the index read step on all sequencing runs. I provided them with the barcodes and they demultiplexed them after the fact.

I downloaded the new, demultiplexed files to Owl/nightingales/C_gigas:

lane2_CTTGTA_L002_R1_001.fastq.gz
lane2_CTTGTA_L002_R1_002.fastq.gz
lane2_CTTGTA_L002_R1_003.fastq.gz
lane2_CTTGTA_L002_R1_004.fastq.gz
lane2_GCCAAT_L002_R1_001.fastq.gz
lane2_GCCAAT_L002_R1_002.fastq.gz
lane2_GCCAAT_L002_R1_003.fastq.gz
lane2_GCCAAT_L002_R1_004.fastq.gz
lane2_GCCAAT_L002_R1_005.fastq.gz
lane2_GCCAAT_L002_R1_006.fastq.gz

Notice that the file names now contain the corresponding index!

Renamed the files, to append the order number to the beginning of the file names:

$for file in lane2*; do mv "$file" "2212_$file"; done

New file names:

2212_lane2_CTTGTA_L002_R1_001.fastq.gz
2212_lane2_CTTGTA_L002_R1_002.fastq.gz
2212_lane2_CTTGTA_L002_R1_003.fastq.gz
2212_lane2_CTTGTA_L002_R1_004.fastq.gz
2212_lane2_GCCAAT_L002_R1_001.fastq.gz
2212_lane2_GCCAAT_L002_R1_002.fastq.gz
2212_lane2_GCCAAT_L002_R1_003.fastq.gz
2212_lane2_GCCAAT_L002_R1_004.fastq.gz
2212_lane2_GCCAAT_L002_R1_005.fastq.gz
2212_lane2_GCCAAT_L002_R1_006.fastq.gz

Updated the checksums.md5 file to include the new files (the command is written to exclude the previously downloaded files that are named “2212_lane2_NoIndex_”; the [^N] regex excludes any files that have a capital ‘N’ at that position in the file name):

$for file in 2212_lane2_[^N]*; do md5 "$file" >> checksums.md5; done

Updated the readme.md file to reflect the addition of these new files.

Share

Sequencing Data – C.gigas Larvae OA

Our sequencing data (Illumina HiSeq2500, 100SE) for this project has completed by Univ. of Oregon Genomics Core Facility (order number 2212).

Samples sequenced/pooled for this run:

Sample Treatment Barcode
400ppm 400ppm GCCAAT
1000ppm 1000ppm CTTGTA

 

All code listed below was run on OS X 10.9.5

Ran a bash script called “download.sh” to download all the files. The script contents were:

#!/bin/bash
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_001.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_002.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_003.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_004.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_005.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_006.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_007.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_008.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_009.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_010.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_011.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_012.fastq.gz

 

Downloaded all 12 fastq.gz files to Owl/web/nightingales/C_gigas

Renamed all files by removing the beginning of each file name (2112?fileName=) and replacing that with 2212_:

$for file in 2212*lane2_NoIndex_L002_R1_0*; do mv "$file" "${file/#2212?fileName=/2212_}"; done

 

Created a directory readme.md (markdown) file to list & describe directory contents: readme.md

$ls *.gz >> readme.md

Note: In order for the readme file to appear in the web directory listing, the file cannot be all upper-case.

 

Create MD5 checksums for each the files: checkums.md5

$md5 2212* >> checksums.md5

Share

Library Quality Assessment – C.gigas OA larvae Illumina libraries

Ran the 400ppm library and the 1000ppm library preps on a DNA1000 Assay Chip (Agilent) on the Agilent 2100 Bioanalyzer.

 

Results:

Data File (XAD): 2100_expert_DNA_1000_DE72902486_2015-03-02_09-18-02.xad

Electropherogram overlay of both samples:

Red = 400ppm

Blue = 1000ppm

 

 

 

Measurement data and parameters are here: 20150302_Bioanalyzer_Cgigas_400_1000ppm_BS-Seq

 

Both libraries look good; no adaptor contamination (peak would be present at ~125bp), good library sizes.

Pooled equal quantities of each library, based off the concentration values above, to prepare the sample for sequencing.

Component Volume (μL) Quantity (ng)
400ppm library 10 14.7
1000ppm library 1.09 14.7
Buffer EB 7.81 N/A
1% Tween20 2.1 N/A
Total 21 N/A

 

The pooled libraries will be submitted tomorrow to the Genomics Core Facility at the Univ. of Oregon for high-throughput sequencing (100bp, SE) on the HiSeq2500 (Illumina). Sample order #2212.

Share

BS-seq Library Prep – C.gigas Larvae OA 1000ppm

Bisulfite Conversion

Pooled 200ng each of the sheared 1B1 (4μL) & 1B2 (used the entire sample, 20μL) 5.13.11 1000ppm C.gigas larvae DNA samples for a total of 400ng. Total volume = 24μL.

Quantified the pooled DNA using the NanoDrop1000 (ThermoFisher) prior to initiating bisulfite conversion.

Clearly, the NanoDrop measurements differ from the expected concentration. NanoDrop suggests the total amount of input DNA is ~1400ng (58ng/μL x 24μL = 1392ng). This is most likely due to RNA carryover, as DNA quantification using a fluorescence-based, double-stranded DNA assay performed previously shows a drastically lower concentration.

Proceeded with bisulfite conversion using the Methylamp DNA Modification Kit (Epigentek) in 1.5mL tube, according to the manufacturer’s protocol:

  • Added 1μL to sample, incubated 10mins @ 37C in water bath
  • Made fresh R1/R2/R3 solution (1.1mL R3 buffer added to vial of R2, vortexed 2mins, 40μL R1 added to mixture – Remainder stored @ -20C in “-20C Kit Components Box”)
  • Added 125μL of R1/R2/R3 solution to sample, incubated 90mins @ 65C in heating block with water
  • Addd 300μL R4 to sample, mixed, transferred to column, spun 12,000RPM 30s
  • Added 200μL R5 to column, spun 12,000RPM 30s
  • Added 50μL R1/ethanol solution to column, incubated 8mins @ RT, spun 12,000RPM 30s
  • Washed column with 200μL of 90% EtOH, spun 12,000RPM 30s; repeated one time.
  • Eluted DNA with 12μL R6, spun 12,000RPM 30s

Quantified post-bisulfite-treated sample on NanoDrop1000:

Definitely a low yield (~108ng) relative to the input (~400ng). Will proceed with Illumina library prep.

 

Library Prep

Illumina library prep was performed with EpiNext Post-Bisulfite DNA Library Preparation Kit (Illumina) (Epigentek).  Changes to the manufacturer’s protocol:

  • Samples were transferred to 1.5mL snap cap tubes for all magnetic bead steps in order to fit in our tube magnets.
  • PCR cycles: 15

No other changes were made to the manufacturer’s protocol.

Epigentek Barcode Indices assigned, per their recommendations for using two libraries for multiplexing (this will be combined with the 400ppm library):

Barcode #12 – CTTGTA

The library was stored @ -20C and will be checked via Bioanalyzer on Monday.

Share

DNA Quantification – C.gigas Larvae 1000ppm

After the discovery that there wasn’t any DNA in the BS-seq Illumina library prep and no DNA in the bisulfite-treated DNA pool, I decided to try to recover any residual DNA left in the 1B2 sample. Sample 1B2 (sheared on 20150109) was dry, so I added 20μL of Buffer EB (Qiagen) to the tube. I vortexed both the 1B1 and 1B2 samples and quantified on the NanoDrop1000 (ThermoFisher). I also re-quantified the pooled BS-treated sample that had been used as input DNA for the libraries.

Results:

Spreadsheet: 20150226_Claire_sheared_Emma_1000ppm_OD260s

Sample 1B1 has ample DNA in it. Since these samples are pools of larvae, we may be able to just proceed with this sample and not worry about pooling with the biological replicate 1B2.

Sample 1B2 has a low amount of DNA, but it’s a usable quantity (total 400ng).

Pooled samples has nothing.

Will make a new pool of DNA from both 1B1 and 1B2 and attempt to make a new bisulfite-treated library.

Share

Library Prep – Quantification of C.gigas larvae OA 1000ppm library

The completed BS Illumina library made on Friday (1000ppm) was quantified via fluorescence using the Quant-iT DNA BR Kit (Life Technologies/Invitrogen).  Also quantified Jake’s libraries. Used 1μL of  each sample and the standards.  All standards were run in duplicate.  Due to limited sample, the libraries were only processed singularly, without replication.  Fluorescence was measured on a FLx800 plate reader (BioTek), using the Gen5 (BioTek) software for all calculations.

Results:

20150209_CgigasOA_BSlibrraryQuants_OluridaLibraryQuants

The good news is that the standard curve looked fine, with an R²=0.998.

The bad news is that there’s no detectable DNA in the sample, just like last time.

Possibly something is totally shot with this sample?  Will quantify the sheared DNA and decide what to do.

I quantified the sheared DNA and there’s nothing there! Where did it go? I just don’t get it. It was sheared, speed-vac’d and resuspended.  All the DNA should still be in the tubes…

Share

Bisulfite NGS Library Prep – Bisulfite Conversion & Illumina Library Construction of C.gigas larvae DNA

Bisulfite Conversion

The previous attempt at constructing a library for the 1000ppm larvae samples failed. I had previously sheared, quantified, and concentrated the DNA from this sample. As I had done previously, I combined 50ng from each of the two 1000ppm samples for a total of 100ng, and brought the sample volume up to 24μL with NanoPure H2O.

Bisulfite conversion was performed with the Methylamp DNA Modification Kit (Epigentek) according to the manufacturer’s protocol.

Sample was eluted with 10μL of Buffer R6 for immediate use.

 

Library Prep

Bisulfite Illumina library was made with EpiNext Post-Bisulfite DNA Library Preparation Kit (Illumina) (Epigentek).  Changes to the manufacturer’s protocol:

  • Samples were transferred to 1.5mL snap cap tubes for all magnetic bead steps in order to fit in our tube magnets.
  • Skipped Step 7.1 (per manufacturer’s recommendation for samples starting with <200ng)
  • Ran 13 cycles during the library amplification step (per manufacturer’s recommendation for samples starting with 100ng)

Sample was transferred to 1.5mL snap cap tube and stored @ -20C.  Will quantify library on Monday when Jake is also finished with his 12 libraries.

 

Share

Bisuflite NGS Library Prep – C.gigas larvae OA bisulfite library quantification

The two completed BS Illumina libraries (400ppm and 1000ppm) were quantified via fluorescence using the Quant-iT DNA BR Kit (Life Technologies/Invitrogen).  Used 1uL of  each sample and the standards.  All standards were run in triplicate.  Due to limited sample, the two libraries were only processed singularly, without replication.  Fluorescence was measured on a FLx800 plate reader (BioTek).

 

Results:

The standard curve, raw fluorescence, and calculated concentrations (as determined by the Gen5 (BioTek) software) can be seen here: 20150128_CgigasOA_BSlibrraryQuants_OluridaLibraryQuants

The standard curve was excellent, exhibiting a R² value = 0.999

 

Sample Concentration (ng/uL)
400ppm 10.592
1000ppm 0.0

 

The 400ppm library looks great, with a good yield.

The 1000ppm library appears to have no measurable quantity of DNA in it.  This is surprising, and disconcerting, as both samples were processed in parallel.  As such, there should be virtually no difference between them, in regards to the library construction process and subsequent yields.

To verify that this wasn’t a pipetting error on my part, I re-quantified the 1000ppm library (in duplicate) and still no detectable DNA.

Will repeat the bisulfite conversion and library construction process on the 1000ppm sample in order to generate a usable library for sequencing.

Share

Bisuflite NGS Library Prep – C.gigas larvae OA bisulfite DNA

The two pooled bisulfite-treated DNA samples (400ppm and 1000ppm) from 20150114 were used to prepare bisulfite Illumina libraries with EpiNext Post-Bisulfite DNA Library Preparation Kit (Illumina) (Epigentek).  Changes to the manufacturer’s protocol:

  • Samples were transferred to 1.5mL snap cap tubes for all magnetic bead steps in order to fit in our tube magnets.
  • Stopped after End Repair step (prior to magnetic bead clean up).  Samples stored @ -20C
Share