Tag Archives: Illumina

Data Management – Illumina NovaSeq Geoduck Genome Sequencing

As part of the Illumina collaborative geoduck genome sequencing project, their end goal has always been to sequence the genome in a single run.

They’ve finally attempted this by running 10x Genomics, Hi-C, Nextera, and TruSeq libraries in a single run of the NovaSeq.

I downloaded the data using the BaseSpace downloader using Chrome on a Windows 7 computer (this is not available on Ubuntu and the command line tools that are available from Illumina are too confusing for me to bother spending the time on to figure out how to use them just to download the data).

Data was saved here:

Generated MD5 checksums (using md5sum on Ubuntu) and appended to the checksums file:

Illumina was unable to provide MD5 checksums on their end, so I was unable to confirm data integrity post-download.

Illumina sample info is here:

Will add info to:

List of files received:

10x-Genomics-Libraries-Geo10x5-A3-MultipleA_S10_L001_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleA_S10_L001_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleA_S10_L002_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleA_S10_L002_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleB_S11_L001_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleB_S11_L001_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleB_S11_L002_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleB_S11_L002_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleC_S12_L001_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleC_S12_L001_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleC_S12_L002_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleC_S12_L002_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleD_S13_L001_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleD_S13_L001_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleD_S13_L002_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x5-A3-MultipleD_S13_L002_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleA_S14_L001_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleA_S14_L001_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleA_S14_L002_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleA_S14_L002_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleB_S15_L001_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleB_S15_L001_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleB_S15_L002_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleB_S15_L002_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleC_S16_L001_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleC_S16_L001_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleC_S16_L002_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleC_S16_L002_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleD_S17_L001_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleD_S17_L001_R2_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleD_S17_L002_R1_001.fastq.gz
10x-Genomics-Libraries-Geo10x6-B3-MultipleD_S17_L002_R2_001.fastq.gz
HiC-Libraries-GeoHiC-C3-N701_S18_L001_R1_001.fastq.gz
HiC-Libraries-GeoHiC-C3-N701_S18_L001_R2_001.fastq.gz
HiC-Libraries-GeoHiC-C3-N701_S18_L002_R1_001.fastq.gz
HiC-Libraries-GeoHiC-C3-N701_S18_L002_R2_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP10-B2-AD013_S7_L001_R1_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP10-B2-AD013_S7_L001_R2_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP10-B2-AD013_S7_L002_R1_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP10-B2-AD013_S7_L002_R2_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP11-C2-AD014_S8_L001_R1_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP11-C2-AD014_S8_L001_R2_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP11-C2-AD014_S8_L002_R1_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP11-C2-AD014_S8_L002_R2_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP12-D2-AD015_S9_L001_R1_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP12-D2-AD015_S9_L001_R2_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP12-D2-AD015_S9_L002_R1_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP12-D2-AD015_S9_L002_R2_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP9-A2-AD002_S6_L001_R1_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP9-A2-AD002_S6_L001_R2_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP9-A2-AD002_S6_L002_R1_001.fastq.gz
Nextera-Mate-Pair-Library-GeoNMP9-A2-AD002_S6_L002_R2_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA1-A1-NR006_S1_L001_R1_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA1-A1-NR006_S1_L001_R2_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA1-A1-NR006_S1_L002_R1_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA1-A1-NR006_S1_L002_R2_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA3-C1-NR012_S2_L001_R1_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA3-C1-NR012_S2_L001_R2_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA3-C1-NR012_S2_L002_R1_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA3-C1-NR012_S2_L002_R2_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA5-E1-NR005_S3_L001_R1_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA5-E1-NR005_S3_L001_R2_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA5-E1-NR005_S3_L002_R1_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA5-E1-NR005_S3_L002_R2_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA7-G1-NR019_S4_L001_R1_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA7-G1-NR019_S4_L001_R2_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA7-G1-NR019_S4_L002_R1_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA7-G1-NR019_S4_L002_R2_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA8-H1-NR021_S5_L001_R1_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA8-H1-NR021_S5_L001_R2_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA8-H1-NR021_S5_L002_R1_001.fastq.gz
Trueseq-stranded-mRNA-libraries-GeoRNA8-H1-NR021_S5_L002_R2_001.fastq.gz
Share

Genome Assembly – SparseAssembler Geoduck Genomic Data, kmer=101

UPDATE 20180413

Assembly complete. See end of post for data locations.


UPDATE 20180410

Received a status update email:

SLURM Job_id=156637 Name=20180405_sparse_assembler_kmer101_geo Ended, Run time 4-20:17:08, CANCELLED, ExitCode 0

After talking to Steven, it turns out Mox was taken offline for maintenance, which killed all jobs (and access). Ugh.

Will restart tonight once Mox is back online.


OK, here we go! Initiatied an assembly run using SparseAssembler on our Mox HPC node on all of our geoduck genomic sequencing data:

Kmer size set to 101.

This is 118 files of sequencing data!! Fingers crossed…

Slurm script: 20180405_sparse_assembler_kmer101_geoduck_slurm.sh


#!/bin/bash
## Job Name
#SBATCH --job-name=20180405_sparse_assembler_kmer101_geo
## Allocation Definition
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes (We only get 1, so this is fixed)
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=30-00:00:00
## Memory per node
#SBATCH --mem=500G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=samwhite@uw.edu
## Specify the working directory for this job
#SBATCH --workdir=/gscratch/scrubbed/samwhite/20180405_sparseassembler_kmer101_geoduck

/gscratch/srlab/programs/SparseAssembler/SparseAssembler 
LD 0 
NodeCovTh 1 
EdgeCovTh 0 
k 101 
g 15 
PathCovTh 100 
GS 2200000000 
i1 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/AD002_S9_L001_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/AD002_S9_L001_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/AD002_S9_L002_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/AD002_S9_L002_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR005_S4_L001_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR005_S4_L001_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR005_S4_L002_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR005_S4_L002_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR006_S3_L001_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR006_S3_L001_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR006_S3_L002_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR006_S3_L002_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR012_S1_L001_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR012_S1_L001_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR012_S1_L002_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR012_S1_L002_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR013_AD013_S2_L001_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR013_AD013_S2_L001_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR013_AD013_S2_L002_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR013_AD013_S2_L002_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR014_AD014_S5_L001_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR014_AD014_S5_L001_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR014_AD014_S5_L002_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR014_AD014_S5_L002_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR015_AD015_S6_L001_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR015_AD015_S6_L001_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR015_AD015_S6_L002_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR015_AD015_S6_L002_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR019_S7_L001_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR019_S7_L001_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR019_S7_L002_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR019_S7_L002_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR021_S8_L001_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR021_S8_L001_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR021_S8_L002_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimgalore_geoduck_novaseq/NR021_S8_L002_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/bgi_geoduck/151114_I191_FCH3Y35BCXX_L1_wHAIPI023989-79_1.fastq 
i2 /gscratch/scrubbed/samwhite/bgi_geoduck/151114_I191_FCH3Y35BCXX_L1_wHAIPI023989-79_2.fastq 
i1 /gscratch/scrubbed/samwhite/bgi_geoduck/151114_I191_FCH3Y35BCXX_L2_wHAMPI023988-81_1.fastq 
i2 /gscratch/scrubbed/samwhite/bgi_geoduck/151114_I191_FCH3Y35BCXX_L2_wHAMPI023988-81_2.fastq 
i1 /gscratch/scrubbed/samwhite/bgi_geoduck/151122_I136_FCH3L2FBBXX_L7_wHAXPI023990-97_1.fastq 
i2 /gscratch/scrubbed/samwhite/bgi_geoduck/151122_I136_FCH3L2FBBXX_L7_wHAXPI023990-97_2.fastq 
i1 /gscratch/scrubbed/samwhite/bgi_geoduck/160103_I137_FCH3V5YBBXX_L3_WHPANwalDDAADWAAPEI-101_1.fastq 
i2 /gscratch/scrubbed/samwhite/bgi_geoduck/160103_I137_FCH3V5YBBXX_L3_WHPANwalDDAADWAAPEI-101_2.fastq 
i1 /gscratch/scrubbed/samwhite/bgi_geoduck/160103_I137_FCH3V5YBBXX_L4_WHPANwalDDAADWAAPEI-101_1.fastq 
i2 /gscratch/scrubbed/samwhite/bgi_geoduck/160103_I137_FCH3V5YBBXX_L4_WHPANwalDDAADWAAPEI-101_2.fastq 
i1 /gscratch/scrubbed/samwhite/bgi_geoduck/160103_I137_FCH3V5YBBXX_L5_WHPANwalDDABDLAAPEI-100_1.fastq 
i2 /gscratch/scrubbed/samwhite/bgi_geoduck/160103_I137_FCH3V5YBBXX_L5_WHPANwalDDABDLAAPEI-100_2.fastq 
i1 /gscratch/scrubbed/samwhite/bgi_geoduck/160103_I137_FCH3V5YBBXX_L5_WHPANwalDDACDTAAPEI-102_1.fastq 
i2 /gscratch/scrubbed/samwhite/bgi_geoduck/160103_I137_FCH3V5YBBXX_L5_WHPANwalDDACDTAAPEI-102_2.fastq 
i1 /gscratch/scrubbed/samwhite/bgi_geoduck/160103_I137_FCH3V5YBBXX_L6_WHPANwalDDABDLAAPEI-100_1.fastq 
i2 /gscratch/scrubbed/samwhite/bgi_geoduck/160103_I137_FCH3V5YBBXX_L6_WHPANwalDDABDLAAPEI-100_2.fastq 
i1 /gscratch/scrubbed/samwhite/bgi_geoduck/160103_I137_FCH3V5YBBXX_L6_WHPANwalDDACDTAAPEI-102_1.fastq 
i2 /gscratch/scrubbed/samwhite/bgi_geoduck/160103_I137_FCH3V5YBBXX_L6_WHPANwalDDACDTAAPEI-102_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-1_S1_L001_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-1_S1_L001_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-2_S5_L002_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-2_S5_L002_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-2to4kb-1_S2_L001_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-2to4kb-1_S2_L001_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-2to4kb-2_S6_L002_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-2to4kb-2_S6_L002_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-2to4kb-3_S10_L003_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-2to4kb-3_S10_L003_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-2to4kb-4_S14_L004_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-2to4kb-4_S14_L004_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-2to4kb-5_S18_L005_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-2to4kb-5_S18_L005_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-2to4kb-6_S22_L006_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-2to4kb-6_S22_L006_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-2to4kb-7_S26_L007_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-2to4kb-7_S26_L007_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-2to4kb-8_S30_L008_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-2to4kb-8_S30_L008_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-3_S9_L003_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-3_S9_L003_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-4_S13_L004_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-4_S13_L004_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-5_S17_L005_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-5_S17_L005_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-5to7kb-1_S3_L001_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-5to7kb-1_S3_L001_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-5to7kb-2_S7_L002_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-5to7kb-2_S7_L002_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-5to7kb-3_S11_L003_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-5to7kb-3_S11_L003_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-5to7kb-4_S15_L004_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-5to7kb-4_S15_L004_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-5to7kb-5_S19_L005_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-5to7kb-5_S19_L005_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-5to7kb-6_S23_L006_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-5to7kb-6_S23_L006_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-5to7kb-7_S27_L007_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-5to7kb-7_S27_L007_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-5to7kb-8_S31_L008_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-5to7kb-8_S31_L008_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-6_S21_L006_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-6_S21_L006_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-7_S25_L007_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-7_S25_L007_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-8_S29_L008_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-8_S29_L008_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-8to10kb-1_S4_L001_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-8to10kb-1_S4_L001_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-8to10kb-2_S8_L002_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-8to10kb-2_S8_L002_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-8to10kb-3_S12_L003_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-8to10kb-3_S12_L003_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-8to10kb-4_S16_L004_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-8to10kb-4_S16_L004_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-8to10kb-5_S20_L005_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-8to10kb-5_S20_L005_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-8to10kb-6_S24_L006_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-8to10kb-6_S24_L006_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-8to10kb-7_S28_L007_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-8to10kb-7_S28_L007_R2_001_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-8to10kb-8_S32_L008_R1_001_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/illumina_geoduck_hiseq/20180328_trim_galore_illumina_hiseq_geoduck/Geoduck-NMP-gDNA-8to10kb-8_S32_L008_R2_001_val_2.fastq
Results:

Output folder: 20180405_sparseassembler_kmer101_geoduck/

Slurm output files:

SparseAssembler Assembly (FASTA): Contigs.txt

Added this to the GitHub wiki for our geoduck genome assemblies.

Share

Data Received – Crassostrea virginica MBD BS-seq from ZymoResearch

Received the sequencing data from ZymoResearch for the <em>Crassostrea virginica</em> gonad MBD DNA that was sent to them on 20180207 for bisulfite conversion, library construction, and sequencing.

Gzipped FASTQ files were:

  1. downloaded to Owl/nightingales/C_virginica
  2. MD5 checksums verified
  3. MD5 checksums appended to the checksums.md5 file
  4. readme.md file updated
  5. Updated nightingales Google Sheet

Here’s the list of files received:

zr2096_10_s1_R1.fastq.gz
zr2096_10_s1_R2.fastq.gz
zr2096_1_s1_R1.fastq.gz
zr2096_1_s1_R2.fastq.gz
zr2096_2_s1_R1.fastq.gz
zr2096_2_s1_R2.fastq.gz
zr2096_3_s1_R1.fastq.gz
zr2096_3_s1_R2.fastq.gz
zr2096_4_s1_R1.fastq.gz
zr2096_4_s1_R2.fastq.gz
zr2096_5_s1_R1.fastq.gz
zr2096_5_s1_R2.fastq.gz
zr2096_6_s1_R1.fastq.gz
zr2096_6_s1_R2.fastq.gz
zr2096_7_s1_R1.fastq.gz
zr2096_7_s1_R2.fastq.gz
zr2096_8_s1_R1.fastq.gz
zr2096_8_s1_R2.fastq.gz
zr2096_9_s1_R1.fastq.gz
zr2096_9_s1_R2.fastq.gz

Here’s the sample processing history:

Share

FastQC/MultiQC – Illumina HiSeq Genome Sequencing Data

Since running SparseAssembler seems to be working and actually able to produce assemblies, I’ve decided I’ll try to beef up the geoduck genome assembly with the rest of our existing genomic sequencing data.

Yesterday, I transferred our BGI geoduck data to our Mox node and ran it through FASTQC

I transferred our Illumina HiSeq data sets (*NMP*.fastq.gz) to our Mox node (/gscratch/scrubbed/samwhite/illumina_geoduck_hiseq). These were part of the Illumina-sponsored sequencing project.

I verified the MD5 checksums (not documented) and then ran FASTQC, followed by MultiQC.

FastQC slurm script: 20180328_fastqc_illumina_geoduck_hiseq_slurm.sh

This was followed with MultiQC (locally, after copying the the FastQC output to Owl).

Results:

FASTQC output: 20180328_illumina_hiseq_geoduck_fastqc

MultiQC output: 20180328_illumina_hiseq_geoduck_fastqc/multiqc_data

MultiQC HTML report: 20180328_illumina_hiseq_geoduck_fastqc/multiqc_data/multiqc_report.html

Well, lots of fails. I high level of “Per Base N Content” (these are only warnings, but we also haven’t received data with these warnings before). Also, they all fail in the “Overrepresented sequences” analysis.

I’ll run these through TrimGalore! (probably twice), and see how things change.

Share

NovaSeq Assembly – Trimmed Geoduck NovaSeq with Meraculous

Attempted to use Meraculous to assemble the trimmed geoduck NovaSeq data.

Here’s the Meraculous manual (PDF).

After a bunch of various issues (running out of hard drive space – multiple times, config file issues, typos), I’ve finally given up on running meraculous. It failed, again, saying it couldn’t find a file in a directory that meraculous created! I’ve emailed the authors and if they have an easy fix, I’ll implement it and see what happens.

Anyway, it’s all documented in the Jupyter Notebook below.

One good thing came out of all of it is that I had to run kmergenie to identify an appopriate kmer size to use for assembly, as well as estimated genome size (this info is needed for both meraculous and SOAPdeNovo (which I’ll be trying next)):

kmergenie output folder: http://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/20180206_kmergenie/
kmergenie HTML report (doesn’t display histograms for some reason): 20180206_kmergenie/histograms_report.html
kmer size: 117
Est. genome size: 2.17Gbp

Jupyter Notebook (GitHub): 20180205_roadrunner_meraculous_geoduck_novaseq.ipynb

Share

Adapter Trimming and FASTQC – Illumina Geoduck Novaseq Data

We would like to get an assembly of the geoduck NovaSeq data that Illumina provided us with.

Steven previously ran the raw data through FASTQC and there was a significant amount of adapter contamination (up to 44% in some libraries) present (see his FASTQC report here).

So, I trimmed them using TrimGalore and re-ran FASTQC on them.

This required two rounds of trimming using the “auto-detect” feature of Trim Galore.

  • Round 1: remove NovaSeq adapters
  • Round 2: remove standard Illumina adapters

See Jupyter notebook below for the gritty details.

Results:

All data for this NovaSeq assembly project can be found here: http://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/.

Round 1 Trim Galore reports: [20180125_trim_galore_reports/](http://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/20180125_trim_galore_reports/]
Round 1 FASTQC: 20180129_trimmed_multiqc_fastqc_01
Round 1 FASTQC MultiQC overview: 20180129_trimmed_multiqc_fastqc_01/multiqc_report.html


 
 
 
 
 
 

Round 2 Trim Galore reports: 20180125_geoduck_novaseq/20180205_trim_galore_reports/
Round 2 FASTQC: 20180205_trimmed_fastqc_02/
Round 2 FASTQC MultiQC overview: 20180205_trimmed_multiqc_fastqc_02/multiqc_report.html

 
 
 
 
 
 

For the astute observer, you might notice the “Per Base Sequence Content” generates a “Fail” warning for all samples. Per the FASTQC help, this is likely expected (due to the fact that NovaSeq libraries are prepared using transposases) and doesn’t have any downstream impacts on analyses.

 
 
 
 
 
 

Jupyter Notebook (GitHub): 20180125_roadrunner_trimming_geoduck_novaseq.ipynb

Share

Genome Assembly – Olympia Oyster Illumina & PacBio Using PB Jelly w/BGI Scaffold Assembly

After another attempt to fix PB Jelly, I ran it again.

We’ll see how it goes this time…

Re-ran this using the BGI Illumina scaffolds FASTA.

Here’s a brief rundown of how this was run:

See the Jupyter Notebook for full details of run (see Results section below).

Results:

Output folder: http://owl.fish.washington.edu/Athaliana/20171130_oly_pbjelly/

Output FASTA file: http://owl.fish.washington.edu/Athaliana/20171130_oly_pbjelly/jelly.out.fasta

Quast assessment of output FASTA:

Assembly jelly.out
# contigs (>= 0 bp) 696946
# contigs (>= 1000 bp) 159429
# contigs (>= 5000 bp) 68750
# contigs (>= 10000 bp) 35320
# contigs (>= 25000 bp) 7048
# contigs (>= 50000 bp) 894
Total length (>= 0 bp) 1253001795
Total length (>= 1000 bp) 1140787867
Total length (>= 5000 bp) 932263178
Total length (>= 10000 bp) 691523275
Total length (>= 25000 bp) 261425921
Total length (>= 50000 bp) 57741906
# contigs 213264
Largest contig 194507
Total length 1180563613
GC (%) 36.57
N50 12433
N75 5983
L50 26241
L75 60202
# N’s per 100 kbp 6580.58

Have added this assembly to our Olympia oyster genome assemblies table.

This took an insanely long time to complete (nearly six weeks)!!! After some internet searching, I’ve found a pontential solution to this and have initiated another PB Jelly run to see if it will run faster. Regardless, it’ll be interesting to see how the results compare from two independent runs of PB Jelly.

Jupyter Notebook (GitHub): 20171130_emu_pbjelly.ipynb

Share

Genome Assembly – Olympia Oyster Illumina & PacBio Using PB Jelly w/BGI Scaffold Assembly

Yesterday, I ran PB Jelly using Sean’s Platanus assembly, but that didn’t produce an assembly because PB Jelly was expecting gaps in the Illumina reference assembly (i.e. scaffolds, not contigs).

Re-ran this using the BGI Illumina scaffolds FASTA.

Here’s a brief rundown of how this was run:

See the Jupyter Notebook for full details of run (see Results section below).

Results:

Output folder: http://owl.fish.washington.edu/Athaliana/20171114_oly_pbjelly/

Output FASTA file: http://owl.fish.washington.edu/Athaliana/20171114_oly_pbjelly/jelly.out.fasta

OK! This seems to have worked (and it was quick, like less than an hour!), as it actually produced a FASTA file! Will run QUAST with this and some assemblies to compare assembly stats. Have added this assembly to our Olympia oyster genome assemblies table.

Jupyter Notebook (GitHub): 20171114_emu_pbjelly_BGI_scaffold.ipynb

Share