# Troubleshooting – PB Jelly Install on Emu

I previously installed and ran PB Jelly. Despite no error messages being output, I noticed something odd during my quick post-assembly stats check: The PB Jelly numbers were identical to the input reference file. This seemed very strange and made me decide to look a bit deeper in the PB Jelly output files.

As it turns out, PB Jelly did not complete successfully! Here’s a look at one of the output files (notice the error messages!):

Looking around the internet seemed to suggest that the issue could be that the blasr program wasn’t in my system PATH (blasr is located in: /home/shared/bin). So, I updated that, since /home/shared/bin wasn’t in the system PATH!:

After doing this, I noticed that the PATH assignment in the /etc/environment file is incorrect – it has the $PATH variable appended to the front of the list. This results in the system PATH appending itself to itself over and over again, resulting in a ridiculously long list (like in the screen cap directly above this text). So, I removed that portion and re-sourced the /etc/environment file to tidy things up. Fingers crossed this will resolve the issue… # DNA Isolation & Quantification – C. virginica Gonad gDNA I isolated DNA from the Crassotrea virginica gonad samples sent by Katie Lotterhos using the E.Z.N.A. Mollusc Kit with the following modifications: • Samples were homogenized with plastic, disposable pestle in 350μL of ML1 Buffer • No optional steps were used • Eluted each in 100μL of Elution Buffer and pooled into a single sample NOTE: Sample 034 did not process properly (no phase separation after 24:1 chlorform:IAA addition – along with suggested additions of ML1 Buffer) and was discarded. Quantified the DNA using the Qubit dsDNA BR Kit (Invitrogen). Used 2μL of DNA sample. Samples were stored in the same box the tissue was delivered in and stored in the same location in our -80C: rack 8, row 5, column 4. #### Results: Qubit (Google Sheet): 20171114_qubit_Cvirginica_gDNA Ample DNA in all samples for MBDseq. (Refer to “Original Sample Conc.” column in spreadsheet.) Will let Steven & Katie know. # Software Installation – ALPACA on Roadrunner List of software that needed installing to run ALPACA: Installed all software in: /home/shared/ Had to change permissions on /home/shared/. Used the following to change permissions recursively (-R) to allow all admin (i.e. sudo group) users to read/write in this directory: $sudo chown -R :sudo /home/shared

Compiled Celera Assembler from source (per the ALPACA requirements). This is the source file that I used: https://sourceforge.net/projects/wgs-assembler/files/wgs-assembler/wgs-8.3/wgs-8.3rc2.tar.bz2/download

Added all software to my system PATH by adding the following to my ~./bashrc file:

## Add bioinformatics softwares to PATH

export PATH=${PATH}: /home/shared/alpaca: /home/shared/Bismark: /home/shared/bowtie2-2.3.3.1-linux-x86_64: /home/shared/ectools-0.1: /home/shared/PBSuite_15.8.24/bin: /home/shared/pecan/bin: /home/shared/samtools-1.6/bin: /home/shared/wgs-assembler/Linux-amd64/bin  After adding that info to the bottom of my ~./bashrc file, I re-loaded the file into system memory by sourcing the file: $source ~/.bashrc

Followed the ALPACA test instructions to confirm proper installation. More specific test instructions are actually located at the top of this file: /home/shared/alpaca/scripts/run_example.sh

Changed Celera Assembler directory name:

$cd /home/shared/test/ ##### Step 3. $../alpaca/scripts/run_example.sh

Step three failed (which executes the run_example.sh script) due to permission problems.

Realized the script file didn’t have execute perimssions so I added execute permissions with the following command:

sudo chmod +x /home/shared/alpaca/scripts/run_example.sh ##### Step 4. Continued with ALPACA Tests 2 & 3. Everything tested successfully. Will try to get an assembly running with our PacBio and Illumina data. # Software Crash – Olympia oyster genome assembly with Masurca on Mox Ah, the joys of bioinformatics. I just received an email from Mox indicating that the Masurca assembly I started 11 DAYS AGO (!!) crashed. I’m probably not going to put much effort in to trying to figure out what went wrong, but here’s some log file snippets for reference. I’ll probably drop a line to the developers and see if they have any easy ways to address whatever caused the problems, but that’s about as much effort as I’m willing to put into troubleshooting this assembly. Additionally, since this crashed, I’m not going to bother moving any of the files off of Mox. That means they will be deleted automatically by the system around Nov. 9th, 2017. slurm-94620.out (tail) compute_psa 6601202 2632582819 Refining alignments Joining Generating assembly input files Coverage of the mega-reads less than 5 -- using the super reads as well Coverage threshold for splitting unitigs is 138 minimum ovl 63 Running assembly /gscratch/srlab/programs/MaSuRCA-3.2.3/bin/deduplicate_unitigs.sh: line 85: 24330 Aborted (core dumped) overlapStoreBuild -oASM_DIR/$ASM_PREFIX.ovlStore -M 65536 -g$ASM_DIR/$ASM_PREFIX.gkpStore$ASM_DIR/overlaps_dedup.ovb.gz &gt; $ASM_DIR/overlapStore.rebuild.err 2&gt;&amp;1 Assembly stopped or failed, see CA.mr.41.15.17.0.029.log [Mon Oct 30 23:19:37 PDT 2017] Assembly stopped or failed, see CA.mr.41.15.17.0.029.log  CA.mr.41.15.17.0.029.log (tail) number of threads = 28 (OpenMP default) ERROR: overlapStore &#039;/gscratch/scrubbed/samwhite/20171019_masurca_oly_assembly/CA.mr.41.15.17.0.029/genome.ovlStore&#039; is incomplete; previous overlapStoreBuild probably crashed. ---------------------------------------- Failure message: failed to unitig  overlapStore.rebuild.err Scanning overlap files to count the number of overlaps. Found 277.972 million overlaps. Memory limit 65536MB supplied. Ill put 3246167525 IIDs (3435.97 million overlaps) into each of 1 buckets. bucketizing CA.mr.41.15.17.0.029/overlaps_dedup.ovb.gz bucketizing DONE! overlaps skipped: 0 OBT - low quality 0 DUP - non-duplicate overlap 0 DUP - different library 0 DUP - dedup not requested terminate called after throwing an instance of std::bad_alloc what(): std::bad_alloc Failed with Aborted Backtrace (mangled): overlapStoreBuild[0x40523a] /usr/lib64/libpthread.so.0(+0xf100)[0x2af83b3c0100] /usr/lib64/libc.so.6(gsignal+0x37)[0x2af83c0395f7] /usr/lib64/libc.so.6(abort+0x148)[0x2af83c03ace8] /usr/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x165)[0x2af83b62d9d5] /usr/lib64/libstdc++.so.6(+0x5e946)[0x2af83b62b946] /usr/lib64/libstdc++.so.6(+0x5e973)[0x2af83b62b973] /usr/lib64/libstdc++.so.6(+0x5eb93)[0x2af83b62bb93] /usr/lib64/libstdc++.so.6(_Znwm+0x7d)[0x2af83b62c12d] /usr/lib64/libstdc++.so.6(_Znam+0x9)[0x2af83b62c1c9] overlapStoreBuild[0x402e10] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2af83c025b15] overlapStoreBuild[0x403089] Backtrace (demangled): [0] overlapStoreBuild() [0x40523a] [1] /usr/lib64/libpthread.so.0::(null) + 0xf100 [0x2af83b3c0100] [2] /usr/lib64/libc.so.6::(null) + 0x37 [0x2af83c0395f7] [3] /usr/lib64/libc.so.6::(null) + 0x148 [0x2af83c03ace8] [4] /usr/lib64/libstdc++.so.6::__gnu_cxx::__verbose_terminate_handler() + 0x165 [0x2af83b62d9d5] [5] /usr/lib64/libstdc++.so.6::(null) + 0x5e946 [0x2af83b62b946] [6] /usr/lib64/libstdc++.so.6::(null) + 0x5e973 [0x2af83b62b973] [7] /usr/lib64/libstdc++.so.6::(null) + 0x5eb93 [0x2af83b62bb93] [8] /usr/lib64/libstdc++.so.6::operator new(unsigned long) + 0x7d [0x2af83b62c12d] [9] /usr/lib64/libstdc++.so.6::operator new[](unsigned long) + 0x9 [0x2af83b62c1c9] [10] overlapStoreBuild() [0x402e10] [11] /usr/lib64/libc.so.6::(null) + 0xf5 [0x2af83c025b15] [12] overlapStoreBuild() [0x403089] GDB:  # Software Installation – PB Jelly Suite and Blasr on Emu I followed along with what Sean previously did when installing on Emu, but it appears he didn’t install it in the shared location to make it accessible to all users. So, I’m installing it in the /home/shared/ directory. ### First, I need to install legacy blasr from PacBio: Installed in cd /home/shared git clone https://github.com/PacificBiosciences/pitchfork.git cd pitchfork git checkout legacy_blasr make init PREFIX=/home/shared make blasr PREFIX=/home/shared Ran into this error: make[1]: Leaving directory '/home/shared/pitchfork/ports/thirdparty/zlib' make -C ports/thirdparty/hdf5 do-install make[1]: Entering directory '/home/shared/pitchfork/ports/thirdparty/hdf5' /home/shared/pitchfork/bin/pitchfork fetch --url https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.8.16/src/hdf5-1.8.16.tar.gz fetching https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.8.16/src/hdf5-1.8.16.tar.gz tar zxf hdf5-1.8.16.tar.gz -C /home/shared/pitchfork/workspace gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now Makefile:23: recipe for target '/home/shared/pitchfork/workspace/hdf5-1.8.16' failed make[1]: *** [/home/shared/pitchfork/workspace/hdf5-1.8.16] Error 2 make[1]: Leaving directory '/home/shared/pitchfork/ports/thirdparty/hdf5' Makefile:211: recipe for target 'hdf5' failed make: *** [hdf5] Error 2 Luckily, I came across this GitHub Issue that addresses this exact problem. I found the functional URL and downloaded the hdf5-1.8.16.tar.gz file to pitchfork/ports/thirdparty/hdf5. Re-ran make blasr PREFIX=/home/shared and things proceeded without issue. As Sean noted, this part takes a long time. Load the setup-env.sh (this is located here: /home/shared/setup-env.sh source setup-env.sh Blasr install is complete! ### Then, install networkx v1.1, per the PB Jelly documentation: python pip -m install networkx==1.1 ### On to PB Jelly! Edited the setup.sh file and entered in the path to the PB Jelly install on Emu (/home/shared/PBSuite_15.8.24/): #/bin/bash #If you use a virtual env - source it here #source /hgsc_software/PBSuite/pbsuiteVirtualEnv/bin/activate #This is the path where you&#039;ve install the suite. export SWEETPATH=/home/shared/PBSuite_15.8.24/ #for python modules export PYTHONPATH=$PYTHONPATH:$SWEETPATH #for executables export PATH=$PATH:\$SWEETPATH/bin/


Test it out with the test data:

1. Edit the following file to reflect the paths on Emu to find this test data: /home/shared/PBSuite_15.8.24/docs/jellyExample/Protocol.xml

<jellyProtocol>
<reference>/home/shared/PBSuite_15.8.24/docs/jellyExample/data/reference/lambda.fasta</reference>
<outputDir>/home/shared/PBSuite_15.8.24/docs/jellyExample/</outputDir>
<blasr>-minMatch 8 -minPctIdentity 70 -bestn 1 -nCandidates 20 -maxScore -500 -nproc 4 -noSplitSubreads</blasr>
</input>
</jellyProtocol>


I went through all the stages of the test data and got through it successfully. Seems ready to roll!

# Assembly Comparison – Oly PacBio Canu: Sam vs. Sean with Quast

I recently finished an assembly of our Olympia oyster PacBio data using Canu and thought it would be interesting to compare to Sean’s Canu assembly.

Granted, this isn’t a totally true comparison because I think Sean’s assembly is further “polished” using Pilon or something like that, but the Quast analysis is so quick (like < 60 seconds), that it can’t hurt.

See the Jupyter Notebook below for the full deets on running Quast.

Results:

Jupyter Notebook (GitHub): 20171023_docker_oly_pacbio_canu_comparisons.ipynb

# FAIL – Missing Data on Owl!

Uh oh. There appears to be some data that’s been removed from Owl. I noticed this earlier when trying to look at some of Sean’s data. His data should be in a folder with his name in Owl/scaphapoda

Luckily, things are backed up using UW Google Drive:

I’ll restore the data using the backup from Google Drive, but this highlights a major issue – have we lost other data from Owl and how would we ever know??!!

I guess we need to look into some sort of solution for identifying deleted files. The Synology NAS does have a built-in app called Log Center that might offer some options. I’ll look into this.

But, speaking of using Log Center, I can’t find any record of files being removed. Oddly, the existing logs only have information for activity from this morning. Maybe because the server was upgraded over the weekend and an upgrade deletes existing logs???!!! I don’t know, but I can’t find any records about activity on scaphapoda using Log Center.

Regardless, I need to figure out a way to evaluate differences between what currently exists on Owl and what has been backed up. I think I can use just use bash to create a file list of everything on Owl and then compare it to a file list of everything on the UW Google Drive. I think…

# Software Installation – MaSuRCA v3.2.3 Assembler on Mox (Hyak)

Saw this tweet this morning and thought this would be good to try out for our Olympia oyster genome assemblies, as it will handle “hybrid” assemblies (i.e. short-reads and long reads):

Additionally, I was excited by the “…super easy to use.” part in the description. As it turns out, that part of the Tweet is totally untrue. Here are some of the things that make it difficult to use:

• No pre-install readme file. Without readme file there are no instructions/info on:
• Necessary dependencies
• Install command(s)
• Initial install attempt failed with error message. Message suggests trying:
• BOOST_ROOT=install ./install.sh
• No post-install readme file. How do I even get started without any documentation??!!

I managed to track down the guide for this, but didn’t get it via searching the internet. I noticed that the link for the software in the original Tweet had a parent directory, so I navigated there and spotted this:

Quick start guide (PDF): ftp://ftp.genome.umd.edu/pub/MaSuRCA/MaSuRCA_QuickStartGuide.pdf

Although not a big deal, this quick start guide is for the previous version (v.3.2.2).

So, is this where we get to the “super easy to use” part? Uh, no. Check it out:

1. Modify a config file (luckily, a template is created during install)
• Illumina paired-end (PE) reads: Aribtrary two letter prefix, mean read length, and read length standard deviation need to be supplied (!!!)
• Illumina mate-paired (MP) reads: Called “JUMP” in config file and needs the same type of info supplied as PE reads.
• PacBio reads: Need to be in a single FASTA file (ugh)!
• A bunch of other stuff that I can probably leave alone.
2. Run the masurca script (located here: MaSuRCA-3.2.3/bin/masurca). This will generate a new script (called assemble.sh).
3. Run the assemble.sh script that was created in the previous step.

Although not specifically related to the MaSuRCA install, I did encounter problems trying to install this on our Mox (hyak) computing node.

#### Build node fail (ironically, this is the specific type of node that’s supposed to be used for compiling software):

OK, so I decided to try compiling it using the login node (which is not what the login node is supposed to be used for):

I really didn’t want to have to put together an SBATCH script just to compile this software (which compiled without issue, except for that initial BOOST error thingy, on my local Ubuntu 16.04 LTS system), so I just tried running an interactive node and it worked!

Now, on to trying to actually run this thing…

# Fail – Directory Contents Deleted!

Uh, not sure what happened here:

I was running Canu via a Docker container with a Jupyter Notebook. I previously checked on the status by looking at the Canu logs. A couple of hours later, I noticed an error message in the Jupyter terminal output. I decided to check the progress of Canu to make sure it was still running.

It turns out everything in that directory was deleted! EVERYTHING! Including the Jupyter notebook, which must be why it threw the error on the screen. Kinda scary, actually…

I guess I’ll give it another go and see what happens…