Tag Archives: trimming

BS-seq Mapping – Olympia oyster bisulfite sequencing: TrimGalore > FastQC > Bismark

Steven asked me to evaluate our methylation sequencing data sets for Olympia oyster.

According to our Olympia oyster genome wiki, we have the following two sets of BS-seq data:

All computing was conducted on our Apple Xserve: roadrunner.

All steps were documented in this Jupyter Notebook (GitHub): 20180503_emu_oly_methylation_mapping.ipynb

NOTE: The Jupyter Notebook linked above is very large in size. As such it will not render on GitHub. It will need to be downloaded to a computer that can run Jupyter Notebooks and viewed that way.

Here’s a brief overview of what was done.

Samples were trimmed with TrimGalore and then evaluated with FastQC. MultiQC was used to generate a nice visual summary report of all samples.

The Olympia oyster genome assembly, pbjelly_sjw_01, was used as the reference genome and was prepared for use in Bismark:


/home/shared/Bismark-0.19.1/bismark_genome_preparation 
--path_to_bowtie /home/shared/bowtie2-2.3.4.1-linux-x86_64/ 
--verbose /home/sam/data/oly_methylseq/oly_genome/ 
2> 20180507_bismark_genome_prep.err

Bismark was run on trimmed samples with the following command:


/home/shared/Bismark-0.19.1/bismark 
--path_to_bowtie /home/shared/bowtie2-2.3.4.1-linux-x86_64/ 
--genome /home/sam/data/oly_methylseq/oly_genome/ 
-u 1000000 
-p 16 
--non_directional 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/1_ATCACG_L001_R1_001_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/2_CGATGT_L001_R1_001_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/3_TTAGGC_L001_R1_001_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/4_TGACCA_L001_R1_001_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/5_ACAGTG_L001_R1_001_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/6_GCCAAT_L001_R1_001_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/7_CAGATC_L001_R1_001_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/8_ACTTGA_L001_R1_001_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_10_s456_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_11_s456_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_12_s456_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_13_s456_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_14_s456_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_15_s456_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_16_s456_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_17_s456_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_18_s456_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_1_s456_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_2_s456_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_3_s456_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_4_s456_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_5_s456_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_6_s456_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_7_s456_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_8_s456_trimmed.fq.gz 
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_9_s456_trimmed.fq.gz 
2> 20180507_bismark_02.err

Results:

TrimGalore output folder:

FastQC output folder:

MultiQC output folder:

MultiQC Report (HTML):

Bismark genome folder: 20180503_oly_genome_pbjelly_sjw_01_bismark/

Bismark output folder:


Whole genome BS-seq (2015)

Prep overview
  • Library prep: Roberts Lab
  • Sequencing: Genewiz
Bismark Report Mapping Percentage
1_ATCACG_L001_R1_001_trimmed_bismark_bt2_SE_report.txt 40.3%
2_CGATGT_L001_R1_001_trimmed_bismark_bt2_SE_report.txt 39.9%
3_TTAGGC_L001_R1_001_trimmed_bismark_bt2_SE_report.txt 40.2%
4_TGACCA_L001_R1_001_trimmed_bismark_bt2_SE_report.txt 40.4%
5_ACAGTG_L001_R1_001_trimmed_bismark_bt2_SE_report.txt 39.9%
6_GCCAAT_L001_R1_001_trimmed_bismark_bt2_SE_report.txt 39.6%
7_CAGATC_L001_R1_001_trimmed_bismark_bt2_SE_report.txt 39.9%
8_ACTTGA_L001_R1_001_trimmed_bismark_bt2_SE_report.txt 39.7%

MBD BS-seq (2015)

Prep overview
  • MBD: Roberts Lab
  • Library prep: ZymoResearch
  • Sequencing: ZymoResearch
Bismark Report Mapping Percentage
zr1394_1_s456_trimmed_bismark_bt2_SE_report.txt 33.0%
zr1394_2_s456_trimmed_bismark_bt2_SE_report.txt 34.1%
zr1394_3_s456_trimmed_bismark_bt2_SE_report.txt 32.5%
zr1394_4_s456_trimmed_bismark_bt2_SE_report.txt 32.8%
zr1394_5_s456_trimmed_bismark_bt2_SE_report.txt 35.2%
zr1394_6_s456_trimmed_bismark_bt2_SE_report.txt 35.5%
zr1394_7_s456_trimmed_bismark_bt2_SE_report.txt 32.8%
zr1394_8_s456_trimmed_bismark_bt2_SE_report.txt 33.0%
zr1394_9_s456_trimmed_bismark_bt2_SE_report.txt 34.7%
zr1394_10_s456_trimmed_bismark_bt2_SE_report.txt 34.9%
zr1394_11_s456_trimmed_bismark_bt2_SE_report.txt 30.5%
zr1394_12_s456_trimmed_bismark_bt2_SE_report.txt 35.8%
zr1394_13_s456_trimmed_bismark_bt2_SE_report.txt 32.5%
zr1394_14_s456_trimmed_bismark_bt2_SE_report.txt 30.8%
zr1394_15_s456_trimmed_bismark_bt2_SE_report.txt 31.3%
zr1394_16_s456_trimmed_bismark_bt2_SE_report.txt 30.7%
zr1394_17_s456_trimmed_bismark_bt2_SE_report.txt 32.4%
zr1394_18_s456_trimmed_bismark_bt2_SE_report.txt 34.9%
Share

Adapter Trimming and FASTQC – Illumina Geoduck Novaseq Data

We would like to get an assembly of the geoduck NovaSeq data that Illumina provided us with.

Steven previously ran the raw data through FASTQC and there was a significant amount of adapter contamination (up to 44% in some libraries) present (see his FASTQC report here).

So, I trimmed them using TrimGalore and re-ran FASTQC on them.

This required two rounds of trimming using the “auto-detect” feature of Trim Galore.

  • Round 1: remove NovaSeq adapters
  • Round 2: remove standard Illumina adapters

See Jupyter notebook below for the gritty details.

Results:

All data for this NovaSeq assembly project can be found here: http://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/.

Round 1 Trim Galore reports: [20180125_trim_galore_reports/](http://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/20180125_trim_galore_reports/]
Round 1 FASTQC: 20180129_trimmed_multiqc_fastqc_01
Round 1 FASTQC MultiQC overview: 20180129_trimmed_multiqc_fastqc_01/multiqc_report.html


 
 
 
 
 
 

Round 2 Trim Galore reports: 20180125_geoduck_novaseq/20180205_trim_galore_reports/
Round 2 FASTQC: 20180205_trimmed_fastqc_02/
Round 2 FASTQC MultiQC overview: 20180205_trimmed_multiqc_fastqc_02/multiqc_report.html

 
 
 
 
 
 

For the astute observer, you might notice the “Per Base Sequence Content” generates a “Fail” warning for all samples. Per the FASTQC help, this is likely expected (due to the fact that NovaSeq libraries are prepared using transposases) and doesn’t have any downstream impacts on analyses.

 
 
 
 
 
 

Jupyter Notebook (GitHub): 20180125_roadrunner_trimming_geoduck_novaseq.ipynb

Share