Category Archives: Geoduck Genome Sequencing

Geoduck Genome Sequencing

NovaSeq Assembly – Trimmed Geoduck NovaSeq with Meraculous

Attempted to use Meraculous to assemble the trimmed geoduck NovaSeq data.

Here’s the Meraculous manual (PDF).

After a bunch of various issues (running out of hard drive space – multiple times, config file issues, typos), I’ve finally given up on running meraculous. It failed, again, saying it couldn’t find a file in a directory that meraculous created! I’ve emailed the authors and if they have an easy fix, I’ll implement it and see what happens.

Anyway, it’s all documented in the Jupyter Notebook below.

One good thing came out of all of it is that I had to run kmergenie to identify an appopriate kmer size to use for assembly, as well as estimated genome size (this info is needed for both meraculous and SOAPdeNovo (which I’ll be trying next)):

kmergenie output folder: http://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/20180206_kmergenie/
kmergenie HTML report (doesn’t display histograms for some reason): 20180206_kmergenie/histograms_report.html
kmer size: 117
Est. genome size: 2.17Gbp

Jupyter Notebook (GitHub): 20180205_roadrunner_meraculous_geoduck_novaseq.ipynb

Share

Adapter Trimming and FASTQC – Illumina Geoduck Novaseq Data

We would like to get an assembly of the geoduck NovaSeq data that Illumina provided us with.

Steven previously ran the raw data through FASTQC and there was a significant amount of adapter contamination (up to 44% in some libraries) present (see his FASTQC report here).

So, I trimmed them using TrimGalore and re-ran FASTQC on them.

This required two rounds of trimming using the “auto-detect” feature of Trim Galore.

  • Round 1: remove NovaSeq adapters
  • Round 2: remove standard Illumina adapters

See Jupyter notebook below for the gritty details.

Results:

All data for this NovaSeq assembly project can be found here: http://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/.

Round 1 Trim Galore reports: [20180125_trim_galore_reports/](http://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/20180125_trim_galore_reports/]
Round 1 FASTQC: 20180129_trimmed_multiqc_fastqc_01
Round 1 FASTQC MultiQC overview: 20180129_trimmed_multiqc_fastqc_01/multiqc_report.html


 
 
 
 
 
 

Round 2 Trim Galore reports: 20180125_geoduck_novaseq/20180205_trim_galore_reports/
Round 2 FASTQC: 20180205_trimmed_fastqc_02/
Round 2 FASTQC MultiQC overview: 20180205_trimmed_multiqc_fastqc_02/multiqc_report.html

 
 
 
 
 
 

For the astute observer, you might notice the “Per Base Sequence Content” generates a “Fail” warning for all samples. Per the FASTQC help, this is likely expected (due to the fact that NovaSeq libraries are prepared using transposases) and doesn’t have any downstream impacts on analyses.

 
 
 
 
 
 

Jupyter Notebook (GitHub): 20180125_roadrunner_trimming_geoduck_novaseq.ipynb

Share

Data Management – Illumina Geoduck HiSeq & MiSeq Data

The HDD we received from Illumina last week only had data (i.e. fastq files) from the NovaSeq runs they performed – nothing from either the MiSeq, nor the HiSeq runs.

We contacted them about the missing data, they confirmed it was missing, and uploaded the remaining data to BaseSpace.

Began downloading the data – will take awhile…

 

Files will be temporarily stored in these locations:

/volume1/web/nightingales/Geoduck_MiSeq/170317_M03814_0172_000000000-B2K79/Data/GeoDuckRNAMiSeq-35978947

/volume1/web/nightingales/Geoduck_HiSeq/170228_ST-K00104_0382_BHHGTLBBXX/Data/Ironman-35682656

/volume1/web/nightingales/Geoduck_HiSeq/170228_ST-K00104_0381_AHHHWNBBXX/Data/Ironman-35682656

Share

Data Received – Geoduck Genome Sequencing by Illumina

We previously sent some geoduck samples to Illumina, as part of a pilot project for them to test out a new sequencing platform. The data has finally arrived!

It was sent on a 4TB Seagate external hard drive.

Due to weird connection issues we’ve recently encountered with our server, Owl (Synology DS1812+), I connected the HDD directly to Owl via USB (instead of connecting to a computer and transferring). I transferred the data using the Synology web interface to avoid any computer/NAS connection issues that might interrupt the transfer.

We have a meeting with the Illumina people tomorrow afternoon to review the data they’ve provided (looks like it’s going to take awhile, though). Once that meeting takes place, we’ll figure out how to document this project in our data management plan.

 

 

 

Share

DNA Isolation – Geoduck gDNA for Illumina-initiated Sequencing Project

We were previously approached by Cindy Lawley (Illumina Market Development) for possible participation in an Illumina product development project, in which they wanted to have some geoduck tissue and DNA on-hand in case Illumina green-lighted the use of geoduck for testing out the new sequencing platform on non-model organisms. Well, guess what, Illumina has give the green light for sequencing our geoduck! However, they need at least 4μg of gDNA, so I’m isolating more.

Isolated DNA from ctenidia tissue from the same Panopea generosa individual used for the BGI sequencing efforts. Tissue was collected by Brent & Steven on 20150811.

Used the E.Z.N.A. Mollusc Kit (Omega) to isolate DNA from five separate ~60mg pieces of ctenidia tissue according to the manufacturer’s protocol, with the following changes:

  • Samples were homogenized with plastic, disposable pestle in 350μL of ML1 Buffer
  • Incubated homogenate at 60C for 1hr
  • No optional steps were used
  • Performed three rounds of 24:1 chloroform:IAA treatment
  • Eluted each in 50μL of Elution Buffer and pooled into a single sample

Quantified the DNA using the Qubit dsDNA BR Kit (Invitrogen). Used 1μL of DNA sample.

Concentration = 162ng/μL (Quant data is here [Google Sheet]: 20170105_gDNA_geoduck_qubit_quant

Yield is great (total = ~32μg).

Evaluated gDNA quality (i.e. integrity) by running 162ng (1μL) of sample on 0.8% agarose, low-TAE gel stained with ethidium bromide.

Used 5μL of O’GeneRuler DNA Ladder Mix (ThermoFisher).

 

Results:

 

 

DNA looks good: bright high molecular weight band, minimal smearing, and minimal RNA carryover (seen as more intense “smear” at ~500bp).

Will send off 10μg (they only requested 4μg) so that they have extra to work with in case they come across any issues.

Share