Tag Archives: Panopea generosa

Data Management – Integrity Check of Final BGI Olympia Oyster & Geoduck Data

After completing the downloads of these files from BGI, I needed to verify that the downloaded copies matched the originals. Below is a Jupyter Notebook detailing how I verified file integrity via MD5 checksums. It also highlights the importance of doing this check when working with large sequencing files (or, just large files in general), as a few of them had mis-matching MD5 checksums!

Although the notebook is embedded below, it might be easier viewing via the notebook link (hosted on GitHub).

At the end of the day, I had to re-download some files, but all the MD5 checksums match and these data are ready for analysis:

Final Ostrea lurida genome files

Final Panopea generosa genome files

Jupyter Notebook: 20161214_docker_BGI_data_integrity_check.ipynb


Data Management – Download Final BGI Genome & Assembly Files

We received info to download the final data and genome assembly files for geoduck and Olympia oyster from BGI.

In total, the downloads took a little over three days to complete!

The notebook detailing how the files were downloaded is below, but it should be noted that I had to strip the output cells because the output from the download command made the file too large to upload to GitHub, and the size of the notebook file would constantly crash the browser/computer that it was opened in. So, the notebook below is here for posterity.

Jupyter Notebook: 20161206_docker_BGI_genome_downloads.ipynb



Data Management – Geoduck Small Insert Library Genome Assembly from BGI

Received another set of Panopea generosa genome assembly data from BGI back in May! I neglected to create MD5 checksums, as well as a readme file for this data set! Of course, I needed some of the info that the readme file should’ve had and it wasn’t there. So, here’s the skinny…

It’s data assembled from the small insert libraries they created for this project.

All data is stored here: http://owl.fish.washington.edu/P_generosa_genome_assemblies_BGI/20160512/

They’ve provided a Genome Survey (PDF) that has some info about the data they’ve assembled. In it, is the estimated genome size:

Geoduck genome size: 2972.9 Mb

Additionally, there’s a table breaking down the N50 distributions of scaffold and contig sizes.

Data management stuff was performed in a Jupyter (iPython) notebook; see below.

Jupyter Notebook: 20161025_Pgenerosa_Small_Library_Genome_Read_Counts.ipynb


Dissection – Frozen Geoduck & Pacific Oyster

We’re working on a project with Washington Department of Natural Resources’ (DNR) Micah Horwith to identify potential proteomic biomarkers in geoduck (Panopea generosa) and  Pacific oyster (Crassostrea gigas). One aspect of the project is how to best conduct sampling of juvenile geoduck (Panopea generosa) and Pacific oyster (Crassostrea gigas) to minimize changes in the proteome of ctenidia tissue during sampling. Generally, live animals are shucked, tissue dissected, and then the tissue is “snap” frozen. However, Micah’s crew will be collecting animals from wild sites around Puget Sound and, because of the remote locations and the means of collection, will have limited tools and time to perform this type of sampling. Time is a significant component that will have great impact on proteomic status in each individual.

As such, Micah and crew wanted to try out a different means of sampling that would help preserve the state of the proteome at the time of collection. Micah and crew have collected some juveniles of both species and “snap” frozen them in the field in a dry ice/ethanol bath in hopes of being able to best preserve the ctenidia proteome status. I’m attempting to dissect out the frozen ctenidia tissue from both types of animals and am reporting on the success/failure of this method of preservation-sampling protocol.

To test this, I transferred animals (contained in baggies) from the -80C to dry ice. Utensils and weigh boats were cooled on dry ice.



Quick summary: This method won’t and I think sampling will have to take place in the field.

The details of why this won’t work (along with images of the process) are below.


First issue with this sampling method (and should be noted because I believe dry ice/ethanol baths will be used, even with a different sampling methodology) is that the ethanol in the dry ice bath at the time of animal collection is a potential problem for labeling baggies. Notice in the screenshot below that the label for the geoduck baggie (the baggie on the left) has, for all intents and purposes, completely washed off:



Starting with C.gigas, opening the animal was relatively easy. Granted, the animal has become brittle, but access to, and identification of, tissues ended up being pretty easy:





However, dissecting out just ctenidia is a lost cause. The mantle and the ctendia are, essentially, fused together in a frozen block through the oyster. Although the image below might look like part of the shell, it is not. It is strictly a chunk of frozen ctenidia/mantle tissue:




The geoduck were even more difficult. In fact, I couldn’t even manage to remove the soft tissue from the shell (for the uninitiated, there are two geoduck in the image below). I only managed to crush most of the tissue contained within the shell, making it even more impossible (if that’s possible) to identify and dissect out the ctenidia:





SRA Release – Transcriptomic Profiles of Adult Female & Male Gonads in Panopea generosa (Pacific geoduck)

The RNAseq data that I previously submitted to NCBI short read archive (SRA) has been released to the public today. Here are the various links for the project:

Study: SRP072283http://trace.ncbi.nlm.nih.gov/Traces/study/?acc=SRP072283


BioProject: PRJNA316216http://www.ncbi.nlm.nih.gov/bioproject/PRJNA316216

Study: SRP072283http://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP072283

Female Pool Experiment: SRX1659865http://www.ncbi.nlm.nih.gov/sra/SRX1659865

Male Pool Experiment: SRX1659865http://www.ncbi.nlm.nih.gov/sra/SRX1659866


SRA Submission – Genome sequencing of the Pacific geoduck (Panopea generosa)

Adding our geoduck genome sequencing (sequencing done by BGI) to the NCBI Sequence Read Archive (SRS). The current status can be seen in the screen cap below. Release date is set for a year from now, but will likely bump it up. Need Steven to review the details of the submission (BioProject, Experiment descriptions, etc.) before I initiate the public release. Will update this post with the SRA number once we receive it.

Here’s the list of files uploaded to the SRA:


Mate pair sequencing files were uploaded together within a single “Run”.


SRA Submission – Transcriptomic Profiles of Adult Female & Male Gonads in Panopea generosa (Pacific geoduck).

RNAseq experiment, which is part of a larger project that involves characterizing geoduck gonad development across multiple stages: histologically, proteomically, and transcriptomically. Initial sample collection performed by Grace Crandall.

The current status can be seen in the screen cap below. Current release date is set for a year from now, but will likely bump it up. Need Steven to review the details of the submission (BioProject, Experiment descriptions, etc.) before I initiate the public release. Will update this post with the SRA number once we receive it.

Here’s the list of files uploaded to the SRA:


Mate pair sequencing files were uploaded together within a single “Run”.


Data Received – Initial Geoduck Genome Assembly from BGI

The initial assembly of the Ostrea lurida genome is available from BGI. Currently, we’ve stashed it here:


The data provided consisted of the following three files:

  • md5.txt
  • N50.txt
  • scaffold.fa.fill

md5.txt – Checksum file to verify integrity of files after downloading.

N50.txt – Contains some very limited stats on scaffolds provided.

scaffold.fa.fill – A FASTA file of scaffolds. Since these are scaffolds (and NOT contigs!), there are many regions containing NNNNNN’s that have been put in place for scaffold assembly based on paired-end spatial information. As such, the N50 information is not as useful as it would be if these were contigs.

Additional assemblies will be provided at some point. I’ve emailed BGI about what we should expect from this initial assembly and what subsequent assemblies should look like.


Data Received – Panopea generosa genome sequencing files from BGI

Downloaded data from the BGI project portal to our server, Owl, using the Synology Download Station. Although the BGI portal is aesthetically nice, it’s set up poorly for bulk downloads and took a few tries to download all of the files.

Data integrity was assessed and read counts for each file were generated. The files were moved to their permanent storage location on Owl: http://owl.fish.washington.edu/nightingales/P_generosa/

The readme.md file was updated to include project/file information.

The file manipulations were performed in a Jupyter notebook (see below).


Total reads generated for this project: 1,208,635,950

BGI provided us with the raw data files for us to play around with, but they are also currently in the process of performing the genome assembly.


Jupyter Notebook file: 20160126_Olurida_BGI_data_handling.ipynb

Notebook Viewer: 20160126_Olurida_BGI_data_handling.ipynb