Category Archives: Miscellaneous

Tissue Sampling – Crassostrea virginica Tissues for Archiving

I figured it’d be prudent to collect some Eastern oyster (Crassotrea virginica) to have around the lab.

I used one of the C.virginica oysters that I picked up Taylor on 20171210 for sampling.

Sampled:

  • Upper mantle (avoided area that was near gonad/white-ish)
  • Ctenidia
  • Lower mantle
  • Muscle
  • Gonad

Samples were transferred to 1.7mL snap cap tubes, frozen on dry ice, and stored @ -80C in Rack 13, Col 1, Row 5.

Software Install – MSMTP For Email Notices of Bash Job Completion on Emu (Ubuntu)

After I finally resolved the installation of PB Jelly on Emu (running Ubuntu 16.04), I’ve had a PB Jelly assembly running for the past two weeks! I’ve gotten tired of checking on its status (i.e. is it still running?) every day, so I dove in and figured out how to set up Emu to email me when the job is complete!

To get this going, I mainly followed this msmtp ArchWiki guide., but here are the specifics of how I set it up.

Step 1. Installed a mail server:

sudo apt-get install sendmail

Step 2. Installed msmtp:

sudo apt-get install msmtp

Step 3. Created the following file in my home directory (/home/sam/): ~/.msmtprc

The original contents of the file for testing were:

       # Example for a user configuration file ~/.msmtprc
       #
       # This file focuses on TLS and authentication. Features not used here include
       # logging, timeouts, SOCKS proxies, TLS parameters, Delivery Status Notification
       # (DSN) settings, and more.

       # Set default values for all following accounts.
       defaults

       # Use the mail submission port 587 instead of the SMTP port 25.
       port 587

       # Always use STARTTLS.
       tls on
       tls_starttls on
       tls_certcheck off
       # A freemail service
       account uw

       # Host name of the SMTP server
       host smtp.washington.edu

       # Envelope-from address
       from emu@uw.edu

       # Authentication. The password is given using one of five methods, see below.
       auth on
       user samwhite

       # Password method 3: Store the password directly in this file. Usually it is not
       # a good idea to store passwords in plain text files. If you do it anyway, at
       # least make sure that this file can only be read by yourself.
       password myuwpassword

       account default : uw

This is a configuration to allow emails to get sent via the Univ. of Washington email servers. Yes, I currently had UW password saved in this file, but will be addressing this issue below.

Step 4. Changed permissions on ~/.msmtprc to be readable/writable only by me (important, particularly if you’ve stored your password in this file!):

chmod 600 ~/.msmtprc

Step 5. Assigned sendmail to use msmtp with the set command (this sets the following command as a positional parameter by adding to the /etc/mail.rc file:

echo "set sendmail=/usr/bin/msmtp" | sudo tee -a /etc/mail.rc

This command pipers the output of echo to sudo and uses tee -a to append to our desired file (/etc/mail.rc).

Step 5. Send a test email:

echo "Job complete!" | msmtp myuwemail@uw.edu

That will send an email with no subject and the body of the email will contain “Job complete!”.

That’s the basic set up for this.

To use it in your workflow, you’d append that command to the end of any Bash command or in a separate Jupyter notebook cell that is queued to run after a previous cell completes it’s job.

Example:

echo "This counts as a command"; echo "Job complete!" | msmtp myuwemail@uw.edu

This will run the first echo command. When that finishes, then the email command will run. You can get fancy and have different emails in response to how the running program exits (i.e. fails or is successful) and send different email responses, but I’m not going to get into that.

Anyway, not bad! However, we want to make this a bit nicer and more secure.


Improve security:

Step 1. Generate a GPG Key:

Follow the instructions under the Creating an Encryption Key section at this link.

DO NOT CREATE A PASSWORD! JUST HIT ENTER WHEN AT THAT STEP.

Technically, this is does not follow proper security protocols, but this is better than having a plain text password, and setting it up this way is the only way the mail program will send without prompting the user for a password (which kills the automation we’re trying to achieve).

Step 2. Create an encrypted password file:

gpg --encrypt -o ~/.msmtp-password.gpg -r youremailaddress -

After entering that, type your UW email password(NOTE: You will not receive a new prompt, so just type it in), and then Enter. Then, press Ctrl-d.

Step 3. Add the following line to your ~/.msmtprc file:

passwordeval    "gpg --quiet --for-your-eyes-only --no-tty --decrypt ~/.msmtp-password.gpg"

Here’s what the file looks like now:

       # Example for a user configuration file ~/.msmtprc
       #
       # This file focuses on TLS and authentication. Features not used here include
       # logging, timeouts, SOCKS proxies, TLS parameters, Delivery Status Notification
       # (DSN) settings, and more.

       # Set default values for all following accounts.
       defaults

       # Use the mail submission port 587 instead of the SMTP port 25.
       port 587

       # Always use STARTTLS.
       tls on
       tls_starttls on
       tls_certcheck off

       # Email account nickname
       account uw

       # Host name of the SMTP server
       host smtp.washington.edu

       # Envelope-from address
       from emu@uw.edu

       # Authentication. The password is given using one of five methods, see below.
       auth on
       user samwhite


       # Password method 2: Store the password in an encrypted file, and tell msmtp
       # which command to use to decrypt it. This is usually used with GnuPG, as in
       # this example. Usually gpg-agent will ask once for the decryption password.
       passwordeval    "gpg --quiet --for-your-eyes-only --no-tty --decrypt ~/.msmtp-password.gpg"

       account default : uw

Step 4. Change permissions on ~/.msmtp-password.gpg so it’s only readable/writable by you:

chmod 600 ~/.msmtp-password.gpg

Step 5. Send a test email like before:

echo "Job complete!" | msmtp myuwemail@uw.edu

That’s it for security.


Add a subject to the emails:

Step 1. Create ~/.default_subject.mail and add the following lines to the file (substitute your own email address):

To: myuwemail@uw.edu
From: [EMU]
Subject: JOB COMPLETE!

Feel free to change the Subject and/or From info to whatever you’d like.

Step 2. Send message using ~/.default_subject.mail:

cat ~/.default_subject.mail | msmtp myuwemail@uw.edu

To use this in your workflow, you’ll do just like before (but using the command immediately above) and append to the end of any Bash command.


Make it short & sweet

Appending those lines is going to be difficult to remember, is annoying to type out, and displays your email address (particularly if using a publicly hosted Jupyter notebook like most of us in lab do). Here’s a nice way to remedy that.

Step 1. Add email address as variable in ~/.bashrc:

Add the following lines to the end of your ~/.bashrc file:

# Email address
export EMAIL=myuwemail@uw.edu

Your email address is now saved in the variable $EMAIL. You will need to use the following command to load that information:

source ~/.bashrc

Verify that it worked:

echo "$EMAIL"

That should spit out your email address and is ready to be used!

Step 2. Add alias for full mail command to ~/.bash_aliases file:

echo "alias emailme='cat ~/.default_subject.mail | msmtp "$EMAIL"'" >> ~/.bash_aliases

Verify that it worked:

source ~/.bash_aliases
emailme

So, from now on, all you have to do is append the command emailme to the end of any Bash commands and you’ll get email when the job is finished!!! You can edit Steps 1 & 2 to use a variable other than “EMAIL” and an alias other than “emailme” – use whatever you’d like.

DNA Sonication & Bioanalzyer – C. virginica gDNA for MeDIP

I transferred 8ug (136uL) of Crassotrea virginica gDNA (isolated earlier today) to two separate 1.7mL snap cap tubes for sonication/shearing.

I performed shearing at the NOAA Northwest Fisheries Science Center, using the Qsonica Q800R. Mackenzie Gavery assisted me.

Target fragment size was ~500bp.

Samples were run at the same time with the following settings:

  • 10 minutes
  • 30 seconds on, 30 seconds off
  • 25% power

After sonication, fragmentation was assessed using the Seeb Lab’s Bioanlyzer 2100 (Agilent) and the DNA 12000 Chip Kit (Agilent). NOTE: All of the reagents and the chips were past their expiration dates (most in June 2016).

Results:

Agilent 2100 Bioanalyzer Expert file (XAD): [2100 expert_DNA 12000_DE72902486_2017-12-11_13-45-31.xad(http://owl.fish.washington.edu/Athaliana/2100%20expert_DNA%2012000_DE72902486_2017-12-11_13-45-31.xad

Fragmentation was successful, and pretty consistent.

Both samples appear to have an average fragment size of ~420bp. Will proceed with MeDIP, once reagents are received.

Unsheared gDNA:

DNA Isolation & Quantification – Crassostrea virginica Mantle gDNA

DNA was isolated from a single adult Eastern oyster (Crassostrea virginica) for a pilot project with Qiagen to test their new DNA bisulfite conversion kit. The oyster was obtained yesterday afternoon (20171210) from the Taylo rShellfish Pioneer Square location. The oyster was stored @ 4C O/N.

The oyster was shucked and four pieces of upper mantle tissue (~35mg each) were snap frozen in liquid nitrogen (LN2). Tissues were pulverized under LN2 and then DNA was isolated separately from each sample using the E.Z.N.A. Mollusc DNA Kit (Omega) according to the manufcaturer’s protocol.

Samples were eluted with 100uL of Elution Buffer and were pooled into a single tube.

The gDNA was quantified using the Qubit 3.0 (Invitrogen) and Qubit dsDNA Broad Range Kit (Invitrogen), using 5uL of sample.

Results:

Qubit (Google Sheet): 20171211_qubit_virginica_DNA

Concentration is 58.4ng/uL.

That makes the total yield ~23.36ug (23360ng). This is more than enough to perform two separate MeDIP preps and two separate reduced representation digestions with MspI.

Will proceed with shearing of DNA for MeDIP.

Troubleshooting – PB Jelly Install on Emu Continued

The last “fix” didn’t fix everything.

This time, I received an error message that was related to blasr. Some internet searching revealed that I needed to have various library files saved to a variable named: $LD_LIBRARY_PATH

To fix this, I added the following line to the /etc/bash.bashrc file:

export "LD_LIBRARY_PATH=${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}/home/shared/lib:"

The line uses a fancy bash test to determine if the $LD_LIBRARY_PATH variable already exists. This is to prevent the $LD_LIBRARY_PATH from having a leading ":".

As usual, the solution to that problem was found courtesy of StackExchange (#162891).

Also, by putting this line in the /etc/bash.bashrc file, it makes the variable available for all users.

Below are some screen caps to document the process:

Realization that PB Jelly still wasn't going to work:

Identify location of file listed in error message:

Add command to /etc/bash.bashrc to set $LD_LIBRARY_PATH:

Verify $LD_LIBRARY_PATH:

Verify blasr can run:

Troubleshooting – PB Jelly Install on Emu

I previously installed and ran PB Jelly. Despite no error messages being output, I noticed something odd during my quick post-assembly stats check: The PB Jelly numbers were identical to the input reference file. This seemed very strange and made me decide to look a bit deeper in the PB Jelly output files.

As it turns out, PB Jelly did not complete successfully! Here’s a look at one of the output files (notice the error messages!):

Looking around the internet seemed to suggest that the issue could be that the blasr program wasn’t in my system PATH (blasr is located in: /home/shared/bin). So, I updated that, since /home/shared/bin wasn’t in the system PATH!:

After doing this, I noticed that the PATH assignment in the /etc/environment file is incorrect – it has the $PATH variable appended to the front of the list. This results in the system PATH appending itself to itself over and over again, resulting in a ridiculously long list (like in the screen cap directly above this text). So, I removed that portion and re-sourced the /etc/environment file to tidy things up.

Fingers crossed this will resolve the issue…

DNA Isolation & Quantification – C. virginica Gonad gDNA

I isolated DNA from the Crassotrea virginica gonad samples sent by Katie Lotterhos using the E.Z.N.A. Mollusc Kit with the following modifications:

  • Samples were homogenized with plastic, disposable pestle in 350μL of ML1 Buffer
  • No optional steps were used
  • Eluted each in 100μL of Elution Buffer and pooled into a single sample

NOTE: Sample 034 did not process properly (no phase separation after 24:1 chlorform:IAA addition – along with suggested additions of ML1 Buffer) and was discarded.

Quantified the DNA using the Qubit dsDNA BR Kit (Invitrogen). Used 2μL of DNA sample.

Samples were stored in the same box the tissue was delivered in and stored in the same location in our -80C: rack 8, row 5, column 4.

Results:

Qubit (Google Sheet): 20171114_qubit_Cvirginica_gDNA

Ample DNA in all samples for MBDseq. (Refer to “Original Sample Conc.” column in spreadsheet.)

Will let Steven & Katie know.

Software Installation – ALPACA on Roadrunner

List of software that needed installing to run ALPACA:

Installed all software in:

/home/shared/

Had to change permissions on /home/shared/. Used the following to change permissions recursively (-R) to allow all admin (i.e. sudo group) users to read/write in this directory:

$sudo chown -R :sudo /home/shared

Compiled Celera Assembler from source (per the ALPACA requirements). This is the source file that I used: https://sourceforge.net/projects/wgs-assembler/files/wgs-assembler/wgs-8.3/wgs-8.3rc2.tar.bz2/download

Added all software to my system PATH by adding the following to my ~./bashrc file:

## Add bioinformatics softwares to PATH

export PATH=${PATH}:
/home/shared/alpaca:
/home/shared/Bismark:
/home/shared/bowtie2-2.3.3.1-linux-x86_64:
/home/shared/ectools-0.1:
/home/shared/PBSuite_15.8.24/bin:
/home/shared/pecan/bin:
/home/shared/samtools-1.6/bin:
/home/shared/wgs-assembler/Linux-amd64/bin

After adding that info to the bottom of my ~./bashrc file, I re-loaded the file into system memory by sourcing the file:

$source ~/.bashrc

Followed the ALPACA test instructions to confirm proper installation. More specific test instructions are actually located at the top of this file: /home/shared/alpaca/scripts/run_example.sh

Changed Celera Assembler directory name:

$mv /home/shared/wgs-8.3rc2 /home/shared/wgs-assembler
Step 1.
$mkdir /home/shared/test
Step 2.
$cd /home/shared/test/
Step 3.
$../alpaca/scripts/run_example.sh

Step three failed (which executes the run_example.sh script) due to permission problems.

Realized the script file didn’t have execute perimssions so I added execute permissions with the following command:

$sudo chmod +x /home/shared/alpaca/scripts/run_example.sh
Step 4. Continued with ALPACA Tests 2 & 3.

Everything tested successfully. Will try to get an assembly running with our PacBio and Illumina data.

Software Crash – Olympia oyster genome assembly with Masurca on Mox

Ah, the joys of bioinformatics. I just received an email from Mox indicating that the Masurca assembly I started 11 DAYS AGO (!!) crashed.

I’m probably not going to put much effort in to trying to figure out what went wrong, but here’s some log file snippets for reference. I’ll probably drop a line to the developers and see if they have any easy ways to address whatever caused the problems, but that’s about as much effort as I’m willing to put into troubleshooting this assembly.

Additionally, since this crashed, I’m not going to bother moving any of the files off of Mox. That means they will be deleted automatically by the system around Nov. 9th, 2017.


slurm-94620.out (tail)

compute_psa 6601202 2632582819
Refining alignments
Joining
Generating assembly input files
Coverage of the mega-reads less than 5 -- using the super reads as well
Coverage threshold for splitting unitigs is 138 minimum ovl 63
Running assembly
/gscratch/srlab/programs/MaSuRCA-3.2.3/bin/deduplicate_unitigs.sh: line 85: 24330 Aborted                 (core dumped) overlapStoreBuild -o $ASM_DIR/$ASM_PREFIX.ovlStore -M 65536 -g $ASM_DIR/$ASM_PREFIX.gkpStore $ASM_DIR/overlaps_dedup.ovb.gz > $ASM_DIR/overlapStore.rebuild.err 2>&1
Assembly stopped or failed, see CA.mr.41.15.17.0.029.log
[Mon Oct 30 23:19:37 PDT 2017] Assembly stopped or failed, see CA.mr.41.15.17.0.029.log

CA.mr.41.15.17.0.029.log (tail)

number of threads     = 28 (OpenMP default)

ERROR:  overlapStore '/gscratch/scrubbed/samwhite/20171019_masurca_oly_assembly/CA.mr.41.15.17.0.029/genome.ovlStore' is incomplete; previous overlapStoreBuild probably crashed.

----------------------------------------
Failure message:

failed to unitig

overlapStore.rebuild.err

Scanning overlap files to count the number of overlaps.
Found 277.972 million overlaps.
Memory limit 65536MB supplied.  Ill put 3246167525 IIDs (3435.97 million overlaps) into each of 1 buckets.
bucketizing CA.mr.41.15.17.0.029/overlaps_dedup.ovb.gz
bucketizing DONE!
overlaps skipped:
               0 OBT - low quality
               0 DUP - non-duplicate overlap
               0 DUP - different library
               0 DUP - dedup not requested
terminate called after throwing an instance of std::bad_alloc
  what():  std::bad_alloc

Failed with Aborted

Backtrace (mangled):

overlapStoreBuild[0x40523a]
/usr/lib64/libpthread.so.0(+0xf100)[0x2af83b3c0100]
/usr/lib64/libc.so.6(gsignal+0x37)[0x2af83c0395f7]
/usr/lib64/libc.so.6(abort+0x148)[0x2af83c03ace8]
/usr/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x165)[0x2af83b62d9d5]
/usr/lib64/libstdc++.so.6(+0x5e946)[0x2af83b62b946]
/usr/lib64/libstdc++.so.6(+0x5e973)[0x2af83b62b973]
/usr/lib64/libstdc++.so.6(+0x5eb93)[0x2af83b62bb93]
/usr/lib64/libstdc++.so.6(_Znwm+0x7d)[0x2af83b62c12d]
/usr/lib64/libstdc++.so.6(_Znam+0x9)[0x2af83b62c1c9]
overlapStoreBuild[0x402e10]
/usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2af83c025b15]
overlapStoreBuild[0x403089]

Backtrace (demangled):

[0] overlapStoreBuild() [0x40523a]
[1] /usr/lib64/libpthread.so.0::(null) + 0xf100  [0x2af83b3c0100]
[2] /usr/lib64/libc.so.6::(null) + 0x37  [0x2af83c0395f7]
[3] /usr/lib64/libc.so.6::(null) + 0x148  [0x2af83c03ace8]
[4] /usr/lib64/libstdc++.so.6::__gnu_cxx::__verbose_terminate_handler() + 0x165  [0x2af83b62d9d5]
[5] /usr/lib64/libstdc++.so.6::(null) + 0x5e946  [0x2af83b62b946]
[6] /usr/lib64/libstdc++.so.6::(null) + 0x5e973  [0x2af83b62b973]
[7] /usr/lib64/libstdc++.so.6::(null) + 0x5eb93  [0x2af83b62bb93]
[8] /usr/lib64/libstdc++.so.6::operator new(unsigned long) + 0x7d  [0x2af83b62c12d]
[9] /usr/lib64/libstdc++.so.6::operator new[](unsigned long) + 0x9  [0x2af83b62c1c9]
[10] overlapStoreBuild() [0x402e10]
[11] /usr/lib64/libc.so.6::(null) + 0xf5  [0x2af83c025b15]
[12] overlapStoreBuild() [0x403089]

GDB:

Software Installation – PB Jelly Suite and Blasr on Emu

I followed along with what Sean previously did when installing on Emu, but it appears he didn’t install it in the shared location to make it accessible to all users. So, I’m installing it in the /home/shared/ directory.

First, I need to install legacy blasr from PacBio:

Installed in

cd /home/shared
git clone https://github.com/PacificBiosciences/pitchfork.git
cd pitchfork
git checkout legacy_blasr
make init PREFIX=/home/shared
make blasr  PREFIX=/home/shared

Ran into this error:

make[1]: Leaving directory '/home/shared/pitchfork/ports/thirdparty/zlib'
make -C ports/thirdparty/hdf5 do-install
make[1]: Entering directory '/home/shared/pitchfork/ports/thirdparty/hdf5'
/home/shared/pitchfork/bin/pitchfork fetch --url https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.8.16/src/hdf5-1.8.16.tar.gz
fetching https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.8.16/src/hdf5-1.8.16.tar.gz
tar zxf hdf5-1.8.16.tar.gz -C /home/shared/pitchfork/workspace

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Makefile:23: recipe for target '/home/shared/pitchfork/workspace/hdf5-1.8.16' failed
make[1]: *** [/home/shared/pitchfork/workspace/hdf5-1.8.16] Error 2
make[1]: Leaving directory '/home/shared/pitchfork/ports/thirdparty/hdf5'
Makefile:211: recipe for target 'hdf5' failed
make: *** [hdf5] Error 2

Luckily, I came across this GitHub Issue that addresses this exact problem.

I found the functional URL and downloaded the hdf5-1.8.16.tar.gz file to pitchfork/ports/thirdparty/hdf5. Re-ran make blasr PREFIX=/home/shared and things proceeded without issue. As Sean noted, this part takes a long time.

Load the setup-env.sh (this is located here: /home/shared/setup-env.sh

source setup-env.sh

Blasr install is complete!

Then, install networkx v1.1, per the PB Jelly documentation:

python pip -m install networkx==1.1

On to PB Jelly!

Edited the setup.sh file and entered in the path to the PB Jelly install on Emu (/home/shared/PBSuite_15.8.24/):

#/bin/bash

#If you use a virtual env - source it here
#source /hgsc_software/PBSuite/pbsuiteVirtualEnv/bin/activate

#This is the path where you've install the suite.
export SWEETPATH=/home/shared/PBSuite_15.8.24/
#for python modules 
export PYTHONPATH=$PYTHONPATH:$SWEETPATH
#for executables 
export PATH=$PATH:$SWEETPATH/bin/

Test it out with the test data:

  1. Edit the following file to reflect the paths on Emu to find this test data: /home/shared/PBSuite_15.8.24/docs/jellyExample/Protocol.xml

<jellyProtocol>
    <reference>/home/shared/PBSuite_15.8.24/docs/jellyExample/data/reference/lambda.fasta</reference>
    <outputDir>/home/shared/PBSuite_15.8.24/docs/jellyExample/</outputDir>
    <blasr>-minMatch 8 -minPctIdentity 70 -bestn 1 -nCandidates 20 -maxScore -500 -nproc 4 -noSplitSubreads</blasr>
    <input baseDir="/home/shared/PBSuite_15.8.24/docs/jellyExample/data/reads/">
        <job>filtered_subreads.fastq</job>
    </input>
</jellyProtocol>

I went through all the stages of the test data and got through it successfully. Seems ready to roll!