Software Install – 10x Genomics Supernova on Mox (Hyak)

Steven asked me to install Supernova (by 10x Genomics on our Mox node.

First, need to install a dependency: bcl2fastq2
Followed Illumina bcl2fastq2 manual (PDF)

Logged into Mox and initiated a Build node:

srun -p build --time=1:00:00 --pty /bin/bash

Install bclsfastq2 dependency

Illumina bcl2fastq2 manual (PDF)

cd /gscratch/srlab/tmp
wget ftp://webdata2:webdata2@ussd-ftp.illumina.com/downloads/software/bcl2fastq/bcl2fastq2-v2-20-0-tar.zip
export TMP=/gscratch/srlab/tmp/
export SOURCE=${TMP}/bcl2fastq
export BUILD=${TMP}/bcl2fastq2.20-build
export INSTALL_DIR=/gscratch/srlab/programs/bcl2fastq-v2.20
cd ${TMP}
unzip bcl2fastq2-v2-20-0-tar.zip
tar -xvzf bcl2fastq2-v2.20.0.422-Source.tar.gz
cd ${BUILD}
chmod ugo+x ${SOURCE}/src/configure
chmod ugo+x ${SOURCE}/src/cmake/bootstrap/installCmake.sh
${SOURCE}/src/configure --prefix=${INSTALL_DIR}
cd ${BUILD}
make install

Install Supernova 2.0.0

Supernova install directions

cd /gscratch/srlab/programs
wget -O supernova-2.0.0.tar.gz "http://cf.10xgenomics.com/releases/assembly/supernova-2.0.0.tar.gz?Expires=1516707075&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cDovL2NmLjEweGdlbm9taWNzLmNvbS9yZWxlYXNlcy9hc3NlbWJseS9zdXBlcm5vdmEtMi4wLjAudGFyLmd6IiwiQ29uZGl0aW9uIjp7IkRhdGVMZXNzVGhhbiI6eyJBV1M6RXBvY2hUaW1lIjoxNTE2NzA3MDc1fX19XX0_&Signature=XJR7c9UlSkueydP304nKJrqomLXBH9~DWsenwlvBrplFMojbO-DPMghO09Sk6Wi5ApZSPwKB3sl1Wrnjy3qBLwr7dCoT~9oStyBpqlF~Xl2nBY6odnTzUaq3IpLyu8icIkt7DJM0GMXQTTp6rYu1PlLG31hMM5b5HZI3Tjzrhk8URbSrsG~7mm6m5-28afYHX00kT2Xfor7xr-ZSjjLe2jr99SEIARfzZjt6kUEnDMbl~3FXCHsSxXzKrkYXobGmfQhYBrey0iRyCAc9yNF7eSuBHAsqRGsP2yURVcYf3BB5nB1ZuEUo0qLgc5GlZJDQdsqDNC69HkyLCJamkJSnVg__&Key-Pair-Id=APKAI7S6A5RYOXBWRPDA"
tar -xzvf supernova-2.0.0.tar.gz
rm supernova-2.0.0.tar.gz
cd supernova-2.0.0
supernova-cs/2.0.0/bin/supernova sitecheck > sitecheck.txt
supernova-cs/2.0.0/bin/supernova upload samwhite@uw.edu sitecheck.txt
srun -p srlab -A srlab --time=2:00:00 --pty /bin/bash
/gscratch/srlab/programs/supernova-2.0.0/supernova testrun --id=tiny

OK, looks like the test run finished successfully.


Software Crash – Olympia oyster genome assembly with Masurca on Mox

Ah, the joys of bioinformatics. I just received an email from Mox indicating that the Masurca assembly I started 11 DAYS AGO (!!) crashed.

I’m probably not going to put much effort in to trying to figure out what went wrong, but here’s some log file snippets for reference. I’ll probably drop a line to the developers and see if they have any easy ways to address whatever caused the problems, but that’s about as much effort as I’m willing to put into troubleshooting this assembly.

Additionally, since this crashed, I’m not going to bother moving any of the files off of Mox. That means they will be deleted automatically by the system around Nov. 9th, 2017.

slurm-94620.out (tail)

compute_psa 6601202 2632582819
Refining alignments
Generating assembly input files
Coverage of the mega-reads less than 5 -- using the super reads as well
Coverage threshold for splitting unitigs is 138 minimum ovl 63
Running assembly
/gscratch/srlab/programs/MaSuRCA-3.2.3/bin/deduplicate_unitigs.sh: line 85: 24330 Aborted                 (core dumped) overlapStoreBuild -o $ASM_DIR/$ASM_PREFIX.ovlStore -M 65536 -g $ASM_DIR/$ASM_PREFIX.gkpStore $ASM_DIR/overlaps_dedup.ovb.gz > $ASM_DIR/overlapStore.rebuild.err 2>&1
Assembly stopped or failed, see CA.mr.
[Mon Oct 30 23:19:37 PDT 2017] Assembly stopped or failed, see CA.mr.

CA.mr. (tail)

number of threads     = 28 (OpenMP default)

ERROR:  overlapStore '/gscratch/scrubbed/samwhite/20171019_masurca_oly_assembly/CA.mr.' is incomplete; previous overlapStoreBuild probably crashed.

Failure message:

failed to unitig


Scanning overlap files to count the number of overlaps.
Found 277.972 million overlaps.
Memory limit 65536MB supplied.  Ill put 3246167525 IIDs (3435.97 million overlaps) into each of 1 buckets.
bucketizing CA.mr.
bucketizing DONE!
overlaps skipped:
               0 OBT - low quality
               0 DUP - non-duplicate overlap
               0 DUP - different library
               0 DUP - dedup not requested
terminate called after throwing an instance of std::bad_alloc
  what():  std::bad_alloc

Failed with Aborted

Backtrace (mangled):


Backtrace (demangled):

[0] overlapStoreBuild() [0x40523a]
[1] /usr/lib64/libpthread.so.0::(null) + 0xf100  [0x2af83b3c0100]
[2] /usr/lib64/libc.so.6::(null) + 0x37  [0x2af83c0395f7]
[3] /usr/lib64/libc.so.6::(null) + 0x148  [0x2af83c03ace8]
[4] /usr/lib64/libstdc++.so.6::__gnu_cxx::__verbose_terminate_handler() + 0x165  [0x2af83b62d9d5]
[5] /usr/lib64/libstdc++.so.6::(null) + 0x5e946  [0x2af83b62b946]
[6] /usr/lib64/libstdc++.so.6::(null) + 0x5e973  [0x2af83b62b973]
[7] /usr/lib64/libstdc++.so.6::(null) + 0x5eb93  [0x2af83b62bb93]
[8] /usr/lib64/libstdc++.so.6::operator new(unsigned long) + 0x7d  [0x2af83b62c12d]
[9] /usr/lib64/libstdc++.so.6::operator new[](unsigned long) + 0x9  [0x2af83b62c1c9]
[10] overlapStoreBuild() [0x402e10]
[11] /usr/lib64/libc.so.6::(null) + 0xf5  [0x2af83c025b15]
[12] overlapStoreBuild() [0x403089]


Software Installation – MaSuRCA v3.2.3 Assembler on Mox (Hyak)

Saw this tweet this morning and thought this would be good to try out for our Olympia oyster genome assemblies, as it will handle “hybrid” assemblies (i.e. short-reads and long reads):

Additionally, I was excited by the “…super easy to use.” part in the description. As it turns out, that part of the Tweet is totally untrue. Here are some of the things that make it difficult to use:

  • No pre-install readme file. Without readme file there are no instructions/info on:
    • Necessary dependencies
    • Install command(s)
  • Initial install attempt failed with error message. Message suggests trying:
    • BOOST_ROOT=install ./install.sh
  • No post-install readme file. How do I even get started without any documentation??!!

I managed to track down the guide for this, but didn’t get it via searching the internet. I noticed that the link for the software in the original Tweet had a parent directory, so I navigated there and spotted this:

Quick start guide (PDF): ftp://ftp.genome.umd.edu/pub/MaSuRCA/MaSuRCA_QuickStartGuide.pdf

Although not a big deal, this quick start guide is for the previous version (v.3.2.2).

So, is this where we get to the “super easy to use” part? Uh, no. Check it out:

  1. Modify a config file (luckily, a template is created during install)
    • Illumina paired-end (PE) reads: Aribtrary two letter prefix, mean read length, and read length standard deviation need to be supplied (!!!)
    • Illumina mate-paired (MP) reads: Called “JUMP” in config file and needs the same type of info supplied as PE reads.
    • PacBio reads: Need to be in a single FASTA file (ugh)!
    • A bunch of other stuff that I can probably leave alone.
  2. Run the masurca script (located here: MaSuRCA-3.2.3/bin/masurca). This will generate a new script (called assemble.sh).
  3. Run the assemble.sh script that was created in the previous step.

Although not specifically related to the MaSuRCA install, I did encounter problems trying to install this on our Mox (hyak) computing node.

Build node fail (ironically, this is the specific type of node that’s supposed to be used for compiling software):

OK, so I decided to try compiling it using the login node (which is not what the login node is supposed to be used for):

Login node fail:

I really didn’t want to have to put together an SBATCH script just to compile this software (which compiled without issue, except for that initial BOOST error thingy, on my local Ubuntu 16.04 LTS system), so I just tried running an interactive node and it worked!

Now, on to trying to actually run this thing…