# Data Management – Convert Oly PacBio H5 to FASTQ

After working with all of this Olympia oyster genome sequencing data, I remembered that we had an old, singular PacBio SMRT cell file (from June 2013). This file didn’t seem to be included in any recent assemblies of Sean’s or mine. This is most likely because we have it in the PacBio H5 format and not in FASTQ.

I installed PacBio’s pbh5tools on my computer (swoose), converted the file and moved it to owl/nightingales/O_lurida

python bash5tools.py /mnt/owl/nightingales/O_lurida/m130619_081336_42134_c100525122550000001823081109281326_s1_p0.bas.h5 --outType fastq 

I generated an MD5 checksum and appended to the checksums.md5 file in /owl/nightingales/O_lurida using the following command:

md5sum m130619_081336_42134_c100525122550000001823081109281326_s1_p0.fastq | awk '{print $2 " = "$1}' >> checksums.md5

The command above pipes the output to awk to format the output to match the existing format of the checksums.md5 file (i.e. filename = hash).

Will generate updated PacBio assemblies with Canu and/or Racon.

# Genome Assembly – Olympia oyster Redundans/Canu vs. Redundans/Racon

Decided to compare the Redundans using Canu as reference and Redundans using Racon as reference. Both reference assemblies were just our PacBio data.

Jupyter notebook (GitHub): 20171005_docker_oly_redundans.ipynb

Notebook is also embedded at the end of this post.

Results:

It should be noted that the paired reads for each of the BGI mate-pair Illumina data did not assemble, just like last time I used them:

• 160103_I137_FCH3V5YBBXX_L3_WHOSTibkDCABDLAAPEI-62_2.fq.gz
• 160103_I137_FCH3V5YBBXX_L3_WHOSTibkDCACDTAAPEI-75_2.fq.gz
• 160103_I137_FCH3V5YBBXX_L4_WHOSTibkDCABDLAAPEI-62_2.fq.gz
• 160103_I137_FCH3V5YBBXX_L4_WHOSTibkDCACDTAAPEI-75_2.fq.gz

Redundans with Canu is better, suggesting that the Canu assembly is the better of the two PacBio assemblies (which we had already suspected).

QUAST comparison using default settings:

QUAST comparison using –scaffolds setting:

# Assembly Comparisons – Olympia oyster genome assemblies

### — UPDATE 20171009 —

Having run through this a bunch of times now, I realized that the analysis below incorrectly identifies the outputs from Sean’s Redundans runs. The correct output from each of those runs should be the “scaffolds.reduced.fa” FAST files. The “contigs.fa” files that I linked to below are actually the assemblies produced by other programs; which are required as an input for Redudans.

I recently completed an assembly of the UW PacBio sequencing data using Racon and wanted some assembly stats, as well as a way to compare this assembly to the assemblies Sean had completed.

Additionally, Steven recently performed an assembly comparison and I noticed he got some odd results. Specifically, of the three assemblies he compared (PacBio x 1, Illumina x 2), both of the Illumina assemblies had a large quantity of “Ns” in the assemblies. This didn’t seem right and the comparison program he used (QUAST) spit out a message indicating that it seemed like scaffolds were used, instead of contigs. So, I thought I’d give it a shot and see if I could track down non-scaffolded assemblies produced by Sean.

Jupyter notebook (GitHub): 20171003_docker_oly_assembly_comparisons.ipynb

First, I compared the following six assemblies (FASTA files) using QUAST:

Sean’s Assemblies:

Sam’s Assembly:

QUAST output directory: http://owl.fish.washington.edu/Athaliana/20171003_quast_oly_genome_assemblies/

Here’s the assembly comparison of all assemblies (click on image for larger view):

Interactive version of that graphic is here: http://owl.fish.washington.edu/Athaliana/20171003_quast_oly_genome_assemblies/report.html

The first thing that jumps out to me is the fact that two of the Illumina assemblies, which used different assemblers(!!) have the EXACT same assembly stats. This occurrence seems extremely unlikely. I’ve double-checked my Jupyter notebook to make sure that I didn’t assign the same file by accident (see Input #6)

Very strange!

I also noticed that the first Redundans assembly of Sean’s has a ton of “Ns”, suggesting that it’s actually a scaffolded assembly. As with Steven’s QUAST run, QUAST spits out the messages suggesting to use the “–scaffold” option for this file.

The other thing I noticed is the two PacBio assemblies (Canu & Racon) have a huge difference in the total number of bp (~13,000,000)! I ran a QUAST assembly comparison between just those two for easier viewing/comparison (http://owl.fish.washington.edu/Athaliana/20171003_quast_oly_pacbio_assemblies/):

Interactive version of that graphic is here: http://owl.fish.washington.edu/Athaliana/20171003_quast_oly_pacbio_assemblies/report.html

The fact that there is such a large discrepancy in the total number of bps between these two assemblies really leaves me to believe that I am missing a FASTQ file from my assembly. I’m going to go back and see if that is indeed the case or if this difference in the assemblies is real.

Here’s an embedded version of my Jupyter notebook:

# Genome Assembly – Olympia oyster PacBio minimap/miniasm/racon

In this GitHub Issue, Steven had suggested I try out the minimap/miniasm/racon pipeline for assembling our Olympia oyster PacBio data.

I followed the pipeline described by this paper: http://matzlab.weebly.com/uploads/7/6/2/2/76229469/racon.pdf.

Previously, ran the first part of the pipeline: minimap

This notebook entry just contains the miniasm execution. Will follow with racon.

Jupyter Notebook (GitHub): 20170918_docker_pacbio_oly_miniasm0.2.ipynb

# Genome Assembly – Olympia oyster PacBio minimap/miniasm/racon

In this GitHub Issue, Steven had suggested I try out the minimap/miniasm/racon pipeline for assembling our Olympia oyster PacBio data.

I followed the pipeline described by this paper: http://matzlab.weebly.com/uploads/7/6/2/2/76229469/racon.pdf.

This notebook entry just contains the initial minimap execution. Followed up with miniasm and then racon.

Jupyter Notebook (GitHub): 20170907_docker_pacbio_oly_minimap2.ipynb

# Project Progress – Olympia Oyster Genome Assemblies by Sean Bennett

Here’s a brief overview of what Sean has done with the Oly genome assembly front.

Metassembler

• Assemble his BGI assembly and Platanus assembly? Confusing terms here; not sure what he means.
• Failed due to 32-bit vs. 64-bit installation of MUMmer. He didn’t have the chance to re-compile MUMmer as 64-bit. However, a recent MUMmer announcement suggests that MUMmer can now handle genomes of unlimited size.
• I believe he was planning on using (or was using?) GARM, which relies upon MUMmer and may also include a version of MUMmer (outdated version that led to Sean’s error message?).
• Notebook entry

Canu

Redundans

Platanus

# RNA Isolation – Olympia oyster gonad tissue in paraffin histology blocks

My previous go at this was a little premature – I didn’t wait for Laura to fully annotate her slides/blocks. Little did I know, the tissue was mostly visceral mass and, as such, I didn’t hit much in the way of actual gonad tissue. So, I’m redoing this, now that Grace has gone through and annotated the blocks to point out gonad tissue. SN-10-16 was sent to Katherine Silliman on 20170720.

Isolated RNA from Olympia oyster gonad previously preserved with the PAXgene Tissue Fixative and Stabilizer and then embedded in paraffin blocks. See Laura’s notebook for full details on samples and preservation.

RNA was isolated from the following samples using the PAXgene Tissue RNA Kit (Qiagen). Gouged samples from the blocks weighing ~10mg from each of the tissues and processed according the protocol for isolating RNA from blocks of paraffin-embedded tissues.

Background on all of this is in this GitHub Issue

NF-10-22
NF-10-23
NF-10-24
NF-10-26
NF-10-28
NF-10-30
SN-10-16
SN-10-17
SN-10-20
SN-10-25
SN-10-26
SN-10-31

IMPORTANT:

• Prior to beginning, I prepared an aliquot of Buffer TR1 by adding 40μL of β-mercaptoethanol (β-ME) to 4000μL of Buffer TR1)

Isolated RNA according to the PAXgene Tissue RNA Kit protocol with the following alterations:

• “Max speed” spins were performed at 20,000g.
• Tissue disruption was performed by adding ~25-50 glass beads (425 – 600μm diameter) with the Disruptor Genie @ 45C for 15mins (in the Friedman Lab).
• Shaking incubation step was performed with Disruptor Genie
• Samples were eluted with 27μL of Buffer TR4 x 2, incubated @ 65C for 5mins, immediately placed on ice.

Results:

Samples were not quantified due to lack of proper RNA Qubit assay AND the computer that our NanoDrop1000 is hooked up to is dead. Will have Katherine Silliman perform quantification.

Samples were stored at -80C temporarily.

Samples will be sent to Katherine Silliman for high-throughput library construction and sequencing once I hear back from her regarding her availability to receive the samples.

# RNA Isolation – Olympia oyster gonad tissue in paraffin histology blocks

UPDATE 20170712: The RNA I isolated below is from incorrect regions of tissue. I misunderstood exactly what this tissue was, and admittedly, jumped the gun. The tissue is actually collected from the visceral mass – which contains gonad (a small amount) and digestive gland (a large amount). The RNA isolated below will be stored in one of the Shellfish RNA boxes and I will isolate RNA from the correct regions indicated by Grace

Isolated RNA from Olympia oyster gonad previously preserved with the PAXgene Tissue Fixative and Stabilizer and then embedded in paraffin blocks. See Laura’s notebook for full details on samples and preservation.

RNA was isolated from the following samples using the PAXgene Tissue RNA Kit (Qiagen). Gouged samples from the blocks weighing ~10mg from each of the tissues and processed according the protocol for isolating RNA from blocks of paraffin-embedded tissues.

Tissue identification is available in this GitHub Issue

NF-10-22
NF-10-23
NF-10-24
NF-10-26
NF-10-28
NF-10-30
SN-10-16
SN-10-17
SN-10-20
SN-10-25
SN-10-26
SN-10-31

IMPORTANT:

• Prior to beginning, I prepared an aliquot of Buffer TR1 by adding 40μL of β-mercaptoethanol (β-ME) to 4000μL of Buffer TR1).
• Reconstituted DNase I with 550μL of RNase-free H2O. Aliquoted in 100μL volumes and stored @ -20C in the “-20C Kit Components” box.

Isolated RNA according to the PAXgene Tissue RNA Kit protocol with the following alterations:

• “Max speed” spins were performed at 20,000g.
• Tissue disruption was performed by adding ~25-50 glass beads (425 – 600μm diameter) with the Disruptor Genie @ 45C for 15mins (in the Friedman Lab).
• Shaking incubation step was performed with Disruptor Genie
• Samples were eluted with 27μL of Buffer TR4 x 2, incubated @ 65C for 5mins, immediately placed on ice and quantified on the Roberts Lab Qubit 3.0 with the RNA High Sensitivity Assay (ThermoFisher Scientific) using 5μL of each sample.

Results:

Well, the good news is that there’s RNA from all the samples and it seems to be in relatively high concentrations!

The bad news is that the concentrations for 10 of the 12 samples were too high and outside the range of the Qubit RNA HS Assay! Since we don’t have the broad range RNA assay, I can’t properly quantify the remaining samples. However, these samples are being sent to Katherine Silliman at some point, so I’ll leave it up to her to quantify the samples. I’m also guessing that she’ll run them on a Bioanalyzer to assess their integrity prior to beginning library construction, so that will also yield concentrations for the samples.

Samples were stored at -80C temporarily.

Samples will be sent to Katherine Silliman for high-throughput library construction and sequencing once I hear back from her regarding her availability to receive the samples.

# Data Management – SRA Submission Olympia Oyster UW PacBio Data from 20170323

Submitted the FASTQ files from the UW PacBio Data from 20170323 to the NCBI sequence read archive (SRA).

FTP’d the data to NCBI’s servers, following their instructions. Briefly,

Change to the directory where the FASTQ files are (Owl/web/nightingales/O_lurida) and then initiate an FTP session:

ftp -i ftp-private.ncbi.nlm.nih.gov

Enter provided username/password, change to my designated uploads directory, create new folder dedicate to this particular upload. Then, upload all the files using the mput command:

mput *filtered_subreads*

SRA deets are below (assigned FASTQ files to existing BioProject and created a new BioSample). Will update post with SRA number when processing is complete on the NCBI end.

SRA: SRS2339870
Study: SRR5809355
BioProject: PRJNA316624
BioSample: SAMN07326085