# Project Summary: Maturation processes in the marine mollusc, Panopea generosa

Work Accomplished

The overall goal of this project is to develop a fundamental understanding of processes controlling marine mollusc reproductive maturation. In order accomplish this goal the specific research objectives of this proposal were to 1) characterize tissue specific transcriptomic resources for the geoduck and 2) identify proteins that play a role in geoduck reproductive maturation.

The first step in this project was collecting clams at different reproductive stages as determined through histological analysis. Gonadal tissue from 70 geoducks was sampled in batches of about eight per week over the span of two months from November 2014 to early January 2015. Hundreds of images were analyzed and reproductive status was determined for each individual.

Based on histological determination of reproductive maturational stage, seven female and six male paraffin-embedded gonad samples were selected for construction of RNA-seq libraries. A total of 443,468,476 reads were obtained and the de novo assembly resulted in a total of 153,982 transcript contigs with a mean contig length of 660 bp and an N50 value of 1015 bp. In comparing our contigs with oyster sequences whose expression changed during gonad development in 161 matched including geoduck sequences corresponding to genes expressed in gonads in early gonad developmental stages (7), with increasing expression during spermatogenesis (44), with increasing expression during oogenesis (31) and genes with varying expression level during gonadogenesis in both sexes (79) .

Proteomic profiles were determined for the primary reproductive maturation stages in both male and female clams using data dependent acquisition (DDA) of gonad proteins. This approach yielded 3,627 detected proteins across both sexes and three maturation stages. This is a significant escalation in the understanding of proteomic responses in maturation stages of marine mollusks. Based on the DDA data, 27 proteins from early- and late-stage male and female clams were chosen for selected reaction monitoring (SRM). The SRM assay yielded a suite of indicator peptides that can be used as an efficient assay to non-lethally determine geoduck gonad maturation status.

Non-metric multidimensional scaling plot (NMDS) of geoduck gonad whole proteomic profiles generated by data dependent acquisition. Gonad proteomes differ among clams by both sex (male = orange, female = blue) and stage (early-stage = circles, mid-stage = squares, late-stage = triangles; p<0.05).

Impact of Award

Beyond contributing to the fundamental knowledge of marine mollusk reproduction, this award produced numerous publications and provided basis for further funding and proposal submissions. In addition the transcriptomic data was the basis for the course: Bioinformatics for Transcriptomic and Epigenomic Analyses – Centro de Investigación Científica y de Educación Superior de Ensenada, B.C. (CICESE) 19-24 October 2015

Further Funding

Currently two projects have been funded that were based on this project and others have been submitted. Funded projects include: Proteomic response of shellfish to environmental stress; Department of Natural Resources $107,805 and Elucidating the physiological and epigenetic response of tetraploid and triploid Pacific Oysters to environmental stressors; NOAA$178,898. Submitted proposals include one to NOAA on the development of new clam species for aquaculture.

Publications

Crandall, Grace; Roberts, Steven (2016): Reproductive Maturation in Geoduck clams (Panopea generosa). figshare.
https://dx.doi.org/10.6084/m9.figshare.3205975.v1
Retrieved: 14 41, Dec 23, 2016 (GMT)
This fileset includes a research paper describing reproductive maturation in geoduck clams with 200 images of gonadal histological sections and associated datasheets. Downloads = 1761.

Emma B. Timmins-Schiffman, Grace A. Crandall, Brent Vadopalas, Michael E. Riffle, Brook L. Nunn, Steven B. Roberts (2016) Integrating proteomics and selected reaction monitoring to develop a non-invasive assay for geoduck reproductive maturation
bioRxiv 094615; doi: https://doi.org/10.1101/094615

[Data] Transcriptomic profiles of adult female & male gonads in Panopea generosa (Pacific geoduck).
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA316216

Panopea gonad transcriptome
Open Science Framework Project
https://osf.io/3xf6m/

[Data] Geoduck (Panopea generosa) gonad DDA LC-MS/MS
https://www.ebi.ac.uk/pride/archive/projects/PXD003127

[Code] Source Code for GO Analysis in Geoduck Gonad Background
https://github.com/yeastrc/compgo-geoduck-public

[Data] Geoduck (Panopea generosa) gonad DIA LC-MS/MS
https://www.ebi.ac.uk/pride/archive/projects/PXD004921

[Data] Selected reaction monitoring of geoduck gonad peptides to develop biomarkers of reproductive maturation status
http://www.peptideatlas.org/PASS/PASS00943

[Data] Selected reaction monitoring of geoduck hemolymph peptides to develop biomarkers of reproductive maturation status
http://www.peptideatlas.org/PASS/PASS00942

# Getting back on tracks

Yesterday I uploaded v0.0.1 of the Geoduck genome to CoGe.

Now I want to start adding tracks. To do this I used CLC to create RNA-seq tracks from our male and female gonad transcriptome data.

As would be expected only a small amount of reads mapped. This is as we are limiting the genome to the 22 scaffolds with length > 100k.

Males

Females

One thing to point out (and will have to be followed up on) is that many more Female reads mapped back.

I took the Reads data and exported to BAM.

Then uploaded to CoGe.
I called this Version 1, and interestingly I got some cool options.. so I selected them.

This included saving as a Notebook.

This was Finished in less than 5 minutes!

The SNP view.

Voila – we have it in a Browser.

and you can zoom in

Here we have a Notebook view

It is now public, though not quite sure if there is a url.

Everything is public so please give it a look / twirl.

# Second look at Geoduck transcriptome

Last week I popped out a quick assembly and annotation on our geoduck gonadal transcriptome. A second assembly was also done using Trinity.

Updates
August 3 – Confirmed // in file location had no impact on assembly.
July 14 – TransDecoder protein annotations
10:40am – added TransDecoder results
10:29am – added Stats via Trinity

Trinity.pl
--seqType fq
-JM 24G
--left /Volumes/web/cnidarian/Geo_Pool_F_GGCTAC_L006_R1_001_val_1.fq /Volumes/web/cnidarian/Geo_Pool_M_CTTGTA_L006_R1_001_val_1.fq
--right /Volumes/web/cnidarian//Geo_Pool_F_GGCTAC_L006_R2_001_val_2.fq /Volumes/web/cnidarian//Geo_Pool_M_CTTGTA_L006_R2_001_val_2.fq
--CPU 16


## Output

0:999   127840
1000:1999   18164
2000:2999   5321
3000:3999   1817
4000:4999   762
5000:5999   291
6000:6999   135
7000:7999   73
8000:8999   22
9000:9999   29
10000:10999     4
11000:11999     5
12000:12999     3
13000:13999     4
14000:14999     4
15000:15999     3
16000:16999     0
17000:17999     2
18000:18999     1

Total length of sequence:   101862868 bp
Total number of sequences:  154480
N25 stats:          25% of total sequence length is contained in the 8095 sequences &gt;= 2045 bp
N50 stats:          50% of total sequence length is contained in the 26158 sequences &gt;= 1014 bp
N75 stats:          75% of total sequence length is contained in the 64574 sequences &gt;= 446 bp
Total GC count:         37657770 bp
GC %:               36.97 %

hummingbird:Geo-trinity steven$/Users/gilesg/compile/trinityrnaseq_r20131110/util/TrinityStats.pl /Volumes/web/cnidarian/Geo-trinity/trinity_out_dir/Trinity.fasta ################################ ## Counts of transcripts, etc. ################################ Total trinity transcripts: 154480 Total trinity components: 100155 Percent GC: 36.97 ######################################## Stats based on ALL transcript contigs: ######################################## Contig N10: 3444 Contig N20: 2385 Contig N30: 1766 Contig N40: 1343 Contig N50: 1014 Median contig length: 371 Average contig: 659.39 Total assembled bases: 101862868 ##################################################### ## Stats based on ONLY LONGEST ISOFORM per COMPONENT: ##################################################### Contig N10: 2999 Contig N20: 2026 Contig N30: 1462 Contig N40: 1067 Contig N50: 768 Median contig length: 321 Average contig: 553.88 Total assembled bases: 55473621  Rerunning to see if double slash was a problem- did not see anything in error. Also running TransDecoder # TransDecoder Results Ran the following /Users/gilesg/compile/trinityrnaseq_r20131110/trinity-plugins/TransDecoder_r20131110/TransDecoder -t /Volumes/web/cnidarian/Geo-trinity/trinity_out_dir/Trinity.fasta  This provided a peptide file with 36003 sequences. !head /Volumes/web-1/cnidarian/Geo-trinity/Trinity.fasta.transdecoder.pep >cds.comp100047_c0_seq2|m.5982 comp100047_c0_seq2|g.5982 ORF comp100047_c0_seq2|g.5982 comp100047_c0_seq2|m.5982 type:internal len:142 (-) comp100047_c0_seq2:3-425(-) NAECRDLYKIFTQILSVRSQEGKIVIPDEFATKIRNWLGNKEELFKEAHNQKIITFYNEY TREENTFNPIRGKRPMSVPDMPERKYIDQLSRKTQSQCDFCKYKTFTAEDTFGRIDSNFS CSASNAFKLDHWHALFLLKTH  Running blastp on Trinity.fasta.transdecoder.pep !blastp -query /Volumes/web/cnidarian/Geo-trinity/Trinity.fasta.transdecoder.pep -db /usr/local/bioinformatics/dbs/uniprot_sprot.fasta -evalue 1e-5 -max_target_seqs 1 -max_hsps 1 -outfmt 6 -num_threads 4 -out /Volumes/web/cnidarian/Geo-trinity/Trinity.fasta.transdecoder.pep-blastp-uniprot-2.out  # RNAseq Data Receipt – Geoduck Gonad RNA 100bp PE Illumina Received notification that the samples sent on 20150601 for RNAseq were completed. Downloaded the following files from the GENEWIZ servers using FileZilla FTP and stored them on our server (owl/web/nightingales/P_generosa): Geo_Pool_F_GGCTAC_L006_R1_001.fastq.gz Geo_Pool_F_GGCTAC_L006_R2_001.fastq.gz Geo_Pool_M_CTTGTA_L006_R1_001.fastq.gz Geo_Pool_M_CTTGTA_L006_R2_001.fastq.gz Generated md5 checksums for each file: $for i in *; do md5 \$i >> checksums.md5; done

Made a readme.md file for the directory.

# Sample Submission – Geoduck Gonad for RNA-seq

Prepared two pools of geoduck RNA for RNA-seq (Illumina HiSeq2500, 100bp, PE) with GENEWIZ, Inc.

I pooled a set of female and a set of male RNAs that had been selected by Steven based on the Bioanalyzer results from Friday.

The female RNA pool used 210ng of each sample, with the exception being sample #08. This sample used 630ng. The reason for this was due to the fact that there weren’t any other female samples to use from this developmental time point. The two other developmental time points each had three samples contributing to the pool. So, three times the quantity of the other individual samples was used to help equalize the time point contribution to the pooled sample. Additionally, 630ng used the entirety of sample #08.

The male RNA pool used 315ng of each sample. This number differs from the 210ng used for the female RNAs so that the two pools would end up with the same total quantity of RNA. However, now that I’ve typed this, this doesn’t matter since the libraries will be equalized before being run on the Illumina HiSeq2500. Oh well. As long as each sample in each pool contributed to the total amount of RNA, then it’s all good.

The two pools were shipped O/N on dry ice.

• Geo_pool_M
• Geo_pool_F

Calculations (Google Sheet): 20150601_Geoduck_GENEWIZ_calcs

# Bioanalyzer – Geoduck Gonad RNA Quality Assessment

Before proceeding with transcriptomics for this project, we need to assess the integrity of the RNA via Bioanalyzer.

RNA that was previously isolated on 20150508, 20150505, 20150427, and 20150424 (those notebook entries have been updated to report this consolidation and have a link to this notebook entry) were consolidated into single samples (if there had been multiple isolations of the same sample) and spec’d on the Roberts Lab NanoDrop1000:

Google Sheet: 20150528_geoduck_histo_RNA_ODs

NOTE: Screwed up consolidation of Geoduck Block 03 sample (added one of the 04 dupes to the tube, so discarded 03).

RNA was stored in Shellfish RNA Box #5.

RNA was submitted to to Jesse Tsai at University of Washington Department of Environmental and Occupational Health Science Functional Genomics Laboratory for running on the Agilent Bioanalyzer 2100, using either the RNA Pico or RNA Nano chips, depending on RNA concentration (Pico for lower concentrations and Nano for higher concentrations – left decision up to Jesse).

Results:

Bioanalzyer 2100 Pico Data File (XAD): SamWhite_Eukaryote Total RNA Pico_2015-05-28_12-50-00.xad
Bioanalzyer 2100 Nano Data File (XAD): SamWhite_Eukaryote Total RNA Nano_2015-05-28_13-22-53.xad

### Nano Electropherogram

Jesse alerted me to the fact that they did not have any ladder to use on the Nano chip, as someone had used the remainder, but failed to order more. I OK’d him to go ahead with the Nano chip despite lacking ladder, as we primarily needed to assess RNA integrity.

Bad Samples:

• Geo 04 – No RNA detected
• Geo 65, 67, 68 – These three samples show complete degradation of the RNA (i.e. no ribosomal band present, significant smearing on the gel representation).

All other samples look solid. Will discuss with Steven and Brent on how they want to proceed.

Full list of samples for this project (including the Block 03 sample not included in this analysis; see above). Grace’s notebook will have details on what the numbering indicates (e.g. developmental stage).

• block 02
• block 03 (no RNA)
• block 04 (no RNA)
• block 07
• block 08
• block 09
• block 34
• block 35
• block 38
• block 41
• block 42
• block 46
• block 51
• block 65 (degraded RNA)
• block 67 (degraded RNA)
• block 68 (degraded RNA)
• block 69
• block 70

# RNA Isolation – Geoduck Gonad in Paraffin Histology Blocks

UPDATE 20150528: The RNA isolated in this notebook entry may have been consolidated on 20150528.

The RNA isolation I performed earlier this week proved to be better for some of the samples (scraping tissue directly from the blocks), but still exhibited low yields from some samples. I will perform a final RNA isolation attempt (the kit only has six columns left) from the following samples:

• 02
• 03
• 04
• 07
• 08
• 09

Instead of full sections from each histology cassette, I gouged samples directly from the tissue in each of the blocks to maximize the amount of tissue input.

IMPORTANT:

Samples were then processed with the PAXgene Tissue RNA Kit in a single group.

Isolated RNA according to the PAXgene Tissue RNA Kit protocol with the following alterations:

• “Max speed” spins were performed at 19,000g.
• Tissue disruption was performed with the Disruptor Genie @ 45C for 15mins.
• Samples were eluted with 40μL of Buffer TR4, incubated @ 65C for 5mins, immediately placed on ice and quantified on the Roberts Lab NanoDrop1000.

All samples were stored @ -80C in Shellfish RNA Box #5.

Results:

Two samples (02 and 07) produced great yields and perfect RNA (260/280 and 260/230 of ~2.0). The remainder of the samples showed little improvement compared to what I’ve been obtaining from the previous three attempts. Will discuss with Steven and Brent about how to proceed with this project.

# RNA Isolation – Geoduck Gonad in Paraffin Histology Blocks

UPDATE 20150528: The RNA isolated in this notebook entry may have been consolidated on 20150528.

Last week’s RNA isolation failed for more than half of the samples I processed. I will re-isolate RNA from the following samples:

• 02
• 03
• 04
• 07
• 08
• 09
• 35
• 38
• 46
• 65
• 67
• 68

IMPORTANT:

Five 5μm sections were taken from each block. A new blade was used for each block.

Samples were then processed with the PAXgene Tissue RNA Kit in two groups of six.

Isolated RNA according to the PAXgene Tissue RNA Kit protocol with the following alterations:

• “Max speed” spins were performed at 19,000g.
• Tissue disruption was performed with the Disruptor Genie @ 45C for 15mins.
• Samples were eluted with 40μL of Buffer TR4, incubated @ 65C for 5mins, immediately placed on ice and quantified on the Roberts Lab NanoDrop1000.

Results:

Well, these results are very consistent with the data from the last isolation performed on these samples. This fact suggests that the problem lies with the tissue samples and not the isolation (since the isolation has been performed two separate times on these same samples and the results have come out virtually identical both times).

All samples with concentrations < 5ng/μL were discarded. The remaining samples were stored @ -80C in Shellfish RNA Box #5:

• 35
• 38
• 65
• 67

Will discuss with Steven, look at Grace’s notebook to review the preservation process for these samples, and review the PAXgene Tissue RNA Kit to see if it will accommodate a greater number of microtome sections to use for isolation.

# RNA Isolation – Geoduck Gonad in Paraffin Histology Blocks

UPDATE 20150528: The RNA isolated in this notebook entry may have been consolidated on 20150528.

Isolated RNA from geoduck gonad previously preserved with the PAXgene Tissue Fixative and Stabilizer and then embedded in paraffin blocks. See Grace’s notebook for full details on samples and preservation.

RNA was isolated from the following samples using the PAXgene Tissue RNA Kit (Qiagen) from the following geoduck sample blocks:

• 02
• 03
• 04
• 07
• 08
• 09
• 35
• 38
• 41
• 46
• 51
• 65
• 67
• 68
• 69
• 70

IMPORTANT:

Five 5μm sections were taken from each block. A new blade was used for each block.

Samples were then processed with the PAXgene Tissue RNA Kit in two groups of eight.

Isolated RNA according to the PAXgene Tissue RNA Kit protocol with the following alterations:

• “Max speed” spins were performed at 19,000g.
• Tissue disruption was performed with the Disruptor Genie @ 45C for 15mins.
• Samples were eluted with 40μL of Buffer TR4, incubated @ 65C for 5mins, immediately placed on ice and quantified on the Roberts Lab NanoDrop1000.

Results:

Well, these results are certainly not good.

The first set of eight samples I processed yielded no RNA (except #38, which is only marginally better than nothing). All the samples (excluding #38) have been discarded.

The second set of eight samples I processed range from amazing to poor (#68 was barely worth keeping).

I’ll review the protocol, but at the moment I’m at a loss to explain why the first set of eight samples came up empty. Will perform another on these blocks on Monday. Grrrrr.

Samples were stored at -80C in Shellfish RNA Box #5.

# Bioanalyzer Submission – Geoduck Gonad RNA from Histology Blocks

Submitted 3μL (~75ng) of RNA from each of the two gonad samples isolated from foot tissue embedded in paraffin histology blocks 20150408 (to assess quality of RNA) to Jesse Tsai at University of Washington Department of Environmental and Occupational Health Science Functional Genomics Laboratory:

• Geoduck Block 34
• Geoduck Block 42

Jesse will determine if the samples should be run on the RNA Pico or the RNA Nano chips.