# DNA Isolation & Quantification – Geoduck larvae metagenome filter rinses

Isolated DNA from two of the geoduck hatchery metagenome samples Emma delivered on 20180313 to get an idea of what type of yields we might get from these.

• MG 5/15 #8
• MG 5/19 #6

As mentioned in my notebook entry upon receipt of these samples, I’m a bit skeptical will get any sort of recovery, based on sample preservation.

Isolated DNA using DNAzol (MRC, Inc.) in the following manner:

1. Added 1mL of DNAzol to each sample; mixed by pipetting.
2. Added 0.5mL of 100% ethanol; mixed by inversion.
3. Pelleted DNA 5,000g x 5mins @ RT.
5. Wash pellets (not visible) with 1mL 75% ethanol by dribbling down side of tubes.
6. Pelleted DNA 5,000g x 5mins @ RT.
7. Discarded supernatants and dried pellets for 5mins.
8. Resuspended DNA in 20uL of Buffer EB (Qiagen).

Samples were quantified using the Roberts Lab Qubit 3.0 with the Qubit High Sensitivity dsDNA Kit (Invitrogen).

5uL of each sample were used.

#### Results:

As expected, both samples did not yield any detectable DNA.

Will discuss with Steven on what should be done with the remaining samples.

# Samples Received – Geoduck larvae metagenome filter rinses

Received geoduck hatchery metagenome samples from Emma. These samples are intended for DNA isolation.

Admittedly, I’m a bit skeptical that we’ll be able to recover any DNA from these samples, as they had been initially stored as frozen liquid, then thawed, and “supernatant” removed. I’m concerned that the freezing step would result in cell lysis; thus the subsequent removal of “supernatant” would actually be removing the majority of cellular contents that would be released during freezing/lysis.

Here’s the sample prep history, per Emma’s email:

Hi!
Here are the relevant details from my lab notebook:

Filters with bacteria to be extracted for proteomics: https://sr320.github.io/Geoduck-larvae-filters/

Each filter was rinsed and cells sonicated:

1. Put filter on petri dish on ice
2. Use 1-4 mL total to wash front (and back if not obvious where biol material is) of filter while holding with forceps over dish – Use 2 pairs of forceps; I used 4 mL ice cold 50 mM NH4HCO3 to wash inside of filter (filters were folded in half). Washed filters returned to bags and stored at -80C.
3. Put wash collected in dish in eppendorf tubes – at this point, remove the amount that will be used for metagenomics (~1/4 of wash) – put 1 mL in metagenome tube (mg) and the remaining was split between 2 tubes for metaproteomics (mp)

These are bacterial cells in ammonium bicarbonate. I spun them down and removed most of the supernatant from each tube.

Let me know if you need any other info!

Box of samples (containing ~38uL of liquid) were stored in FTR209 -20C (top shelf).

# NovaSeq Assembly – The Struggle is Real – Real Annoying!

Well, I continue to struggle to makek progress on assembling the geoduck Illumina NovaSeq data. Granted, there is a ton of data (374GB!!!!), but it’s still frustrating that we can’t get an assembly anywhere…

Here are some of the struggles so far:

SOAPdenovo2

JR-Assembler

• Can’t install one of the dependencies (SOAP error correction)
• Actually, I need to try the binary version of this, instead of the source version (the source version fails at the make step)

So, next up will trying the following two assemblers:

• JR-Assembler: Will see if SOAPec binary will work, and then run an assembly.
• AllPaths-LG: I was able to install this successfully on Mox.

Additionally, we’ve ordered some additional hard drives and will be converting the old head/master node on the Apple Xserve cluster to Linux. The old master node is a little better equipped than the other Apple Xserve “birds”, so will try to re-run Meraculous on it once we get it converted.

# Assembly – Geoduck Illumina NovaSeq SOAPdenovo2 on Mox (FAIL)

Trying to get the NovaSeq data assembled using SOAPdenovo2 on the Mox HPC node we have and it will not work.

Tried a couple of times and it hasn’t run successfully. Here are links to the files used on Mox (including the batch script and slurm output files). I made slight changes to the formatting of the batch script because I thought there was something wrong. Specifically, the slurm output file in the 20180215 runs does not accurately reflect the command I issued (i.e. 1> ass.log is command, but slurm shows > ass.log).

NOTE: In the 20180218 run, I have excluded transferring the core dump file due to its crazy size:

Here’s the error log generated by SOAPdenovo2 in the 20180218 run (the last line is all you really need to see, though):

Version 2.04: released on July 13th, 2012
Compile May 10 2017 12:50:52

********************
Pregraph
********************

Parameters: pregraph -s /gscratch/scrubbed/samwhite/20180218_soapdenovo2_novaseq_geoduck/soap_config -K 117 -p 24 -o /gscratch/scrubbed/samwhite/20180218_soapdenovo2_novaseq_geoduck/

In /gscratch/scrubbed/samwhite/20180218_soapdenovo2_novaseq_geoduck/soap_config, 1 lib(s), maximum read length 150, maximum name length 256.

/gscratch/scrubbed/samwhite/20180129_trimmed_again/NR005_S4_L001_R1_001_val_1_val_1.fq.gz
/gscratch/scrubbed/samwhite/20180129_trimmed_again/NR005_S4_L001_R2_001_val_2_val_2.fq.gz
/gscratch/scrubbed/samwhite/20180129_trimmed_again/NR005_S4_L002_R1_001_val_1_val_1.fq.gz
/gscratch/scrubbed/samwhite/20180129_trimmed_again/NR005_S4_L002_R2_001_val_2_val_2.fq.gz
/gscratch/scrubbed/samwhite/20180129_trimmed_again/NR006_S3_L001_R1_001_val_1_val_1.fq.gz
/gscratch/scrubbed/samwhite/20180129_trimmed_again/NR006_S3_L001_R2_001_val_2_val_2.fq.gz
/gscratch/scrubbed/samwhite/20180129_trimmed_again/NR006_S3_L002_R1_001_val_1_val_1.fq.gz
/gscratch/scrubbed/samwhite/20180129_trimmed_again/NR006_S3_L002_R2_001_val_2_val_2.fq.gz
/gscratch/scrubbed/samwhite/20180129_trimmed_again/NR012_S1_L001_R1_001_val_1_val_1.fq.gz
/gscratch/scrubbed/samwhite/20180129_trimmed_again/NR012_S1_L001_R2_001_val_2_val_2.fq.gz
/gscratch/scrubbed/samwhite/20180129_trimmed_again/NR012_S1_L002_R1_001_val_1_val_1.fq.gz
/gscratch/scrubbed/samwhite/20180129_trimmed_again/NR012_S1_L002_R2_001_val_2_val_2.fq.gz
-- Out of memory --



I guess I’ll explore some other options for assembling these? I’m having a difficult time accepting that 500GB of RAM is insufficient, but that seems to be the case. Ouch.

# NovaSeq Assembly – Trimmed Geoduck NovaSeq with Meraculous

Attempted to use Meraculous to assemble the trimmed geoduck NovaSeq data.

Here’s the Meraculous manual (PDF).

After a bunch of various issues (running out of hard drive space – multiple times, config file issues, typos), I’ve finally given up on running meraculous. It failed, again, saying it couldn’t find a file in a directory that meraculous created! I’ve emailed the authors and if they have an easy fix, I’ll implement it and see what happens.

Anyway, it’s all documented in the Jupyter Notebook below.

One good thing came out of all of it is that I had to run kmergenie to identify an appopriate kmer size to use for assembly, as well as estimated genome size (this info is needed for both meraculous and SOAPdeNovo (which I’ll be trying next)):

kmergenie output folder: http://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/20180206_kmergenie/
kmergenie HTML report (doesn’t display histograms for some reason): 20180206_kmergenie/histograms_report.html
kmer size: 117
Est. genome size: 2.17Gbp

# Adapter Trimming and FASTQC – Illumina Geoduck Novaseq Data

We would like to get an assembly of the geoduck NovaSeq data that Illumina provided us with.

Steven previously ran the raw data through FASTQC and there was a significant amount of adapter contamination (up to 44% in some libraries) present (see his FASTQC report here).

So, I trimmed them using TrimGalore and re-ran FASTQC on them.

This required two rounds of trimming using the “auto-detect” feature of Trim Galore.

• Round 1: remove NovaSeq adapters
• Round 2: remove standard Illumina adapters

See Jupyter notebook below for the gritty details.

##### Results:

All data for this NovaSeq assembly project can be found here: http://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/.

Round 1 Trim Galore reports: [20180125_trim_galore_reports/](http://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/20180125_trim_galore_reports/]
Round 1 FASTQC: 20180129_trimmed_multiqc_fastqc_01
Round 1 FASTQC MultiQC overview: 20180129_trimmed_multiqc_fastqc_01/multiqc_report.html

Round 2 Trim Galore reports: 20180125_geoduck_novaseq/20180205_trim_galore_reports/
Round 2 FASTQC: 20180205_trimmed_fastqc_02/
Round 2 FASTQC MultiQC overview: 20180205_trimmed_multiqc_fastqc_02/multiqc_report.html

For the astute observer, you might notice the “Per Base Sequence Content” generates a “Fail” warning for all samples. Per the FASTQC help, this is likely expected (due to the fact that NovaSeq libraries are prepared using transposases) and doesn’t have any downstream impacts on analyses.

# Data Management – Illumina Geoduck HiSeq & MiSeq Data

The HDD we received from Illumina last week only had data (i.e. fastq files) from the NovaSeq runs they performed – nothing from either the MiSeq, nor the HiSeq runs.

We contacted them about the missing data, they confirmed it was missing, and uploaded the remaining data to BaseSpace.