NovaSeq Assembly – The Struggle is Real – Real Annoying!

Well, I continue to struggle to makek progress on assembling the geoduck Illumina NovaSeq data. Granted, there is a ton of data (374GB!!!!), but it’s still frustrating that we can’t get an assembly anywhere…

Here are some of the struggles so far:

SOAPdenovo2

JR-Assembler

• Can’t install one of the dependencies (SOAP error correction)
• Actually, I need to try the binary version of this, instead of the source version (the source version fails at the make step)

So, next up will trying the following two assemblers:

• JR-Assembler: Will see if SOAPec binary will work, and then run an assembly.
• AllPaths-LG: I was able to install this successfully on Mox.

Additionally, we’ve ordered some additional hard drives and will be converting the old head/master node on the Apple Xserve cluster to Linux. The old master node is a little better equipped than the other Apple Xserve “birds”, so will try to re-run Meraculous on it once we get it converted.

NovaSeq Assembly – Trimmed Geoduck NovaSeq with Meraculous

Attempted to use Meraculous to assemble the trimmed geoduck NovaSeq data.

Here’s the Meraculous manual (PDF).

After a bunch of various issues (running out of hard drive space – multiple times, config file issues, typos), I’ve finally given up on running meraculous. It failed, again, saying it couldn’t find a file in a directory that meraculous created! I’ve emailed the authors and if they have an easy fix, I’ll implement it and see what happens.

Anyway, it’s all documented in the Jupyter Notebook below.

One good thing came out of all of it is that I had to run kmergenie to identify an appopriate kmer size to use for assembly, as well as estimated genome size (this info is needed for both meraculous and SOAPdeNovo (which I’ll be trying next)):

kmergenie output folder: http://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/20180206_kmergenie/
kmergenie HTML report (doesn’t display histograms for some reason): 20180206_kmergenie/histograms_report.html
kmer size: 117
Est. genome size: 2.17Gbp