Category Archives: Panopea generosa

Panopea generosa

Project Summary: Maturation processes in the marine mollusc, Panopea generosa

Work Accomplished

The overall goal of this project is to develop a fundamental understanding of processes controlling marine mollusc reproductive maturation. In order accomplish this goal the specific research objectives of this proposal were to 1) characterize tissue specific transcriptomic resources for the geoduck and 2) identify proteins that play a role in geoduck reproductive maturation.

The first step in this project was collecting clams at different reproductive stages as determined through histological analysis. Gonadal tissue from 70 geoducks was sampled in batches of about eight per week over the span of two months from November 2014 to early January 2015. Hundreds of images were analyzed and reproductive status was determined for each individual.

Based on histological determination of reproductive maturational stage, seven female and six male paraffin-embedded gonad samples were selected for construction of RNA-seq libraries. A total of 443,468,476 reads were obtained and the de novo assembly resulted in a total of 153,982 transcript contigs with a mean contig length of 660 bp and an N50 value of 1015 bp. In comparing our contigs with oyster sequences whose expression changed during gonad development in 161 matched including geoduck sequences corresponding to genes expressed in gonads in early gonad developmental stages (7), with increasing expression during spermatogenesis (44), with increasing expression during oogenesis (31) and genes with varying expression level during gonadogenesis in both sexes (79) .

Proteomic profiles were determined for the primary reproductive maturation stages in both male and female clams using data dependent acquisition (DDA) of gonad proteins. This approach yielded 3,627 detected proteins across both sexes and three maturation stages. This is a significant escalation in the understanding of proteomic responses in maturation stages of marine mollusks. Based on the DDA data, 27 proteins from early- and late-stage male and female clams were chosen for selected reaction monitoring (SRM). The SRM assay yielded a suite of indicator peptides that can be used as an efficient assay to non-lethally determine geoduck gonad maturation status.

Timmins-Schiffman-JPRv4-sr_docx_1E0D8BEB.png

Non-metric multidimensional scaling plot (NMDS) of geoduck gonad whole proteomic profiles generated by data dependent acquisition. Gonad proteomes differ among clams by both sex (male = orange, female = blue) and stage (early-stage = circles, mid-stage = squares, late-stage = triangles; p<0.05).

Impact of Award

Beyond contributing to the fundamental knowledge of marine mollusk reproduction, this award produced numerous publications and provided basis for further funding and proposal submissions. In addition the transcriptomic data was the basis for the course: Bioinformatics for Transcriptomic and Epigenomic Analyses – Centro de Investigación Científica y de Educación Superior de Ensenada, B.C. (CICESE) 19-24 October 2015

Further Funding

Currently two projects have been funded that were based on this project and others have been submitted. Funded projects include: Proteomic response of shellfish to environmental stress; Department of Natural Resources $107,805 and Elucidating the physiological and epigenetic response of tetraploid and triploid Pacific Oysters to environmental stressors; NOAA $178,898. Submitted proposals include one to NOAA on the development of new clam species for aquaculture.

Publications

Crandall, Grace; Roberts, Steven (2016): Reproductive Maturation in Geoduck clams (Panopea generosa). figshare.
https://dx.doi.org/10.6084/m9.figshare.3205975.v1
Retrieved: 14 41, Dec 23, 2016 (GMT)
This fileset includes a research paper describing reproductive maturation in geoduck clams with 200 images of gonadal histological sections and associated datasheets. Downloads = 1761.

Emma B. Timmins-Schiffman, Grace A. Crandall, Brent Vadopalas, Michael E. Riffle, Brook L. Nunn, Steven B. Roberts (2016) Integrating proteomics and selected reaction monitoring to develop a non-invasive assay for geoduck reproductive maturation
bioRxiv 094615; doi: https://doi.org/10.1101/094615

[Data] Transcriptomic profiles of adult female & male gonads in Panopea generosa (Pacific geoduck).
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA316216

Panopea gonad transcriptome
Open Science Framework Project
https://osf.io/3xf6m/

[Data] Geoduck (Panopea generosa) gonad DDA LC-MS/MS
https://www.ebi.ac.uk/pride/archive/projects/PXD003127

[Code] Source Code for GO Analysis in Geoduck Gonad Background
https://github.com/yeastrc/compgo-geoduck-public

[Data] Geoduck (Panopea generosa) gonad DIA LC-MS/MS
https://www.ebi.ac.uk/pride/archive/projects/PXD004921

[Data] Selected reaction monitoring of geoduck gonad peptides to develop biomarkers of reproductive maturation status
http://www.peptideatlas.org/PASS/PASS00943

[Data] Selected reaction monitoring of geoduck hemolymph peptides to develop biomarkers of reproductive maturation status
http://www.peptideatlas.org/PASS/PASS00942

Share

Getting back on tracks

Yesterday I uploaded v0.0.1 of the Geoduck genome to CoGe.

Now I want to start adding tracks. To do this I used CLC to create RNA-seq tracks from our male and female gonad transcriptome data.

As would be expected only a small amount of reads mapped. This is as we are limiting the genome to the 22 scaffolds with length > 100k.

Males
CLC_Genomics_Workbench_8_5_1_1CAD8B6E.png

Females
CLC_Genomics_Workbench_8_5_1_1CAD8BCB.png

One thing to point out (and will have to be followed up on) is that many more Female reads mapped back.

I took the Reads data and exported to BAM.
CLC_Genomics_Workbench_8_5_1_1CAD8C40.png

Then uploaded to CoGe.
I called this Version 1, and interestingly I got some cool options.. so I selected them.

CoGe__My_Data_1CAD8C90.png

This included saving as a Notebook.

CoGe__My_Data_1CAD8CC2.png


This was Finished in less than 5 minutes!
CoGe__My_Data_1CAD8CEB.png

The SNP view.
CoGe__My_Data_1CAD8D42.png

Voila – we have it in a Browser.
JBrowse_scaffold860_1__221360_and_Getting_back_on_track_md_1CAD9095.png

and you can zoom in
JBrowse_scaffold4546_56623__57472_1CAD8E29.png

Here we have a Notebook view
CoGe__My_Data_1CAD8E81.png
It is now public, though not quite sure if there is a url.

Everything is public so please give it a look / twirl.
CoGe__My_Data_1CAD8F10.png

Share

Panoplea of data

We have had the data for a draft genome of Panopea generosa for a bit. Here is a quick look.

All raw data is available @ http://owl.fish.washington.edu/nightingales/P_generosa/

With a first pass assembly here.

There are over 14 million scaffolds at this point with 22 scaffolds greater than 100,000 bp. We are using those to kick the tire of COGE and see if this is good portal for analysis and sharing.

Banners_and_Alerts_and_CoGe__My_Data_and_CLC_Genomics_Workbench_8_5_1_and_Add_New_Post_‹_half-shell_—_WordPress_1CAC76A2.png

 

There is not much to see now in the genome browser, but should hopefully have more soon.
CoGe__My_Data_1CAC7724.png

Share

From Ensenada and beyond

There have not been many posts recently, but that is not to say I have not been doing any science. Much of what I have been doing is numerous burst commits on the Panopea transcriptome paper / project. This can be found @ https://github.com/sr320/paper-pano-go.

You can see all the Jupyter nbs in this sub-directory. I will highlight some of my proudest cell moments here:

Ok the first one is so good I will just give you the whole thing except that I will provide just a little more comments ##

!wc -l analyses/Geoduck-transcriptome-v2.tab
  154407 analyses/Geoduck-transcriptome-v2.tab

!head -5 analyses/Geoduck_v2_blastn-NT.out
comp190_c0_seq1 gi  315593157   gb  CP002417.1      84.50   200 31  0   1   200 2271015 2271214 1e-47     198   Bacteria    Variovorax paradoxus EPS, complete genome   595537  Variovorax paradoxus EPS    Variovorax paradoxus EPS    b-proteobacteria
comp1900_c0_seq1    gi  481319564   gb  CP003293.1      100.00  271 0   0   1   271 1334370 1334640 1e-138    501   Bacteria    Propionibacterium acnes HL096PA1, complete genome   1134454 Propionibacterium acnes HL096PA1    Propionibacterium acnes HL096PA1    high GC Gram+
comp2164_c0_seq1    gi  221728669   gb  CP001392.1      98.47   261 4   0   1   261 721134  721394  2e-126    460   Bacteria    Acidovorax ebreus TPSY, complete genome 535289  Acidovorax ebreus TPSY  Acidovorax ebreus TPSY  b-proteobacteria
comp2742_c0_seq1    gi  527256352   ref XM_005146392.1      85.65   230 33  0   16  245 2293    2522    7e-61     243   Eukaryota   PREDICTED: Melopsittacus undulatus exostosin-like glycosyltransferase 3 (EXTL3), mRNA   13146   Melopsittacus undulatus budgerigar  birds
comp3315_c0_seq1    gi  156627645   gb  AC209228.1      79.13   206 36  6   3   202 76584   76380   1e-28     135   Eukaryota   Populus trichocarpa clone POP075-L19, complete sequence 3694    Populus trichocarpa black cottonwood    eudicots

#Lets subset above table to non Eukaryotes
#Here I am letting awk know we are dealing with tabs, 
#and I want to have all rows where column 17 is NOT Eukaryota.
!awk -F"t" '$17 != "Eukaryota" {print $1, $17 ,$15}' analyses/Geoduck_v2_blastn-NT.out 
> analyses/Non-Eukaryota-Geoduck-v2

!sort analyses/Non-Eukaryota-Geoduck-v2 > analyses/Non-Eukaryota-Geoduck-v2.sorted

!sort analyses/Geoduck-transcriptome-v2.tab > analyses/Geoduck-transcriptome-v2.sorted

!head -2 analyses/Geoduck-transcriptome-v2.sorted
comp100000_c0_seq1      TGAATGTATGTTTGTGAACGTATGTATATGAATGTATGTATGTGAATGCATACCATCTGTATAAGTATAATCCGACCGGGAGATGTTTATTCACACAGTTTGGCATTATGACGTTTAACCTCTGAATTGGCGTCTCTTGCTACTGACCGCTTCACAGTGATGACCCATGTTGTCACTTCTAATCTATTTATGTATTGCTCTTTTATATTATACTATTAACGCTGTTAAAATACACTACCGCTAAACAAATAAGAGTTGTGGGTTTGAATCATTGGTGAGTGCAGGAACCTCAGCAGGTCATTAAGATTTACGTGTACGTCTATCCTAAACCTACATGTTTCAACTTTGTTGTTTTTCGGTTTCGTTCTCTGTACACAGCTCTTGAAACATACATAAAATACCATTTTATTAAAAAATATGTCTCTATTTAATGTTAAAACCTTTTTAAGAAAA
comp100001_c1_seq1      GCTTTACCAGTTGTTACAAACATTTTAATAGTTATAGTTAATATACACAACATTTTAAATTAACTAGTGTCGAGAACTTGAGTCGGACATAGAGAATTAAATGTTTGTTGAACTTTAGCCAAGCACTTTTATTCTATTACTTTTTGAAGTATTTAATACCTTAAAATAATGGAATACTCCTGTAGAGTCCTTGAAGCCATCAACAATTTACCAACCTCCAAATAAAATATGAATATATTTTACATGATGAATTTACATAATGGATATATCATTGATATCTGCCAAGTTAACTTCACCTACCATTTTTAAGCTTACTTTGACCATGTTAGTTGGTATTGTGTATATAACGAGTGGGAGGACATTCATACCTGGCATTTGTTTGGTCAAACTGACACAAGATTTATGTTTATTTCAAACCTATATATAAAACAAGTCTCAATGAATATCTTCCTAGGCACAAGACAATGCTGATAAAATGTCTTGTTCAAGGACA


# joining with -v suppressing joined lines
!join -v 1 analyses/Geoduck-transcriptome-v2.sorted 
analyses/Non-Eukaryota-Geoduck-v2.sorted | wc -l
  153982

!join -v 1 analyses/Geoduck-transcriptome-v2.sorted 
analyses/Non-Eukaryota-Geoduck-v2.sorted 
> ../data-results/Geoduck-transcriptome-v3.tab

!head -2 ../data-results/Geoduck-transcriptome-v3.tab
comp100000_c0_seq1 TGAATGTATGTTTGTGAACGTATGTATATGAATGTATGTATGTGAATGCATACCATCTGTATAAGTATAATCCGACCGGGAGATGTTTATTCACACAGTTTGGCATTATGACGTTTAACCTCTGAATTGGCGTCTCTTGCTACTGACCGCTTCACAGTGATGACCCATGTTGTCACTTCTAATCTATTTATGTATTGCTCTTTTATATTATACTATTAACGCTGTTAAAATACACTACCGCTAAACAAATAAGAGTTGTGGGTTTGAATCATTGGTGAGTGCAGGAACCTCAGCAGGTCATTAAGATTTACGTGTACGTCTATCCTAAACCTACATGTTTCAACTTTGTTGTTTTTCGGTTTCGTTCTCTGTACACAGCTCTTGAAACATACATAAAATACCATTTTATTAAAAAATATGTCTCTATTTAATGTTAAAACCTTTTTAAGAAAA
comp100001_c1_seq1 GCTTTACCAGTTGTTACAAACATTTTAATAGTTATAGTTAATATACACAACATTTTAAATTAACTAGTGTCGAGAACTTGAGTCGGACATAGAGAATTAAATGTTTGTTGAACTTTAGCCAAGCACTTTTATTCTATTACTTTTTGAAGTATTTAATACCTTAAAATAATGGAATACTCCTGTAGAGTCCTTGAAGCCATCAACAATTTACCAACCTCCAAATAAAATATGAATATATTTTACATGATGAATTTACATAATGGATATATCATTGATATCTGCCAAGTTAACTTCACCTACCATTTTTAAGCTTACTTTGACCATGTTAGTTGGTATTGTGTATATAACGAGTGGGAGGACATTCATACCTGGCATTTGTTTGGTCAAACTGACACAAGATTTATGTTTATTTCAAACCTATATATAAAACAAGTCTCAATGAATATCTTCCTAGGCACAAGACAATGCTGATAAAATGTCTTGTTCAAGGACA

#Going from tab back to fasta!
!awk '{print ">"$1"n"$2}' ../data-results/Geoduck-transcriptome-v3.tab 
> ../data-results/Geoduck-transcriptome-v3.fa

!fgrep -c ">" ../data-results/Geoduck-transcriptome-v3.fa
153982

tldr:

'$c != string`  join -v  awk '{print ">"$1"n"$2}'

One more tidbit- I wanted to see how many blast hits were in the opposite direction "- frame".

thus:

!awk '($10-$9) < 0 {print $1, "t", ($10-$9)}' 
../data-results/Geoduck-tranv2-blastx_sprot.tab 
> analyses/Geoduck-tranv2-minus_direction.tab
!head analyses/Geoduck-tranv2-minus_direction.tab
!wc -l analyses/Geoduck-tranv2-minus_direction.tab

comp95_c0_seq1   -230
comp146_c0_seq1      -173
comp195_c0_seq1      -185
comp296_c0_seq1      -200
comp455_c1_seq1      -197
comp943_c0_seq1      -218
comp1059_c0_seq1     -227
comp1683_c0_seq1     -206
comp1868_c0_seq1     -308
comp1910_c1_seq1     -248
   10413 analyses/Geoduck-tranv2-minus_direction.tab
Share

Big day, big clam

Today we sampled the geoduck (Panopea generosa) for genome sequencing. Here is how things went down.

It was an early morning for the clam, peaking out to see a glorious sunrise on the porch.
2015-08-11_06_28_24_jpg__3_documents__3_total_pages__1B7A6D1E.png

From there it was off to the lab.

2015-08-11_06_31_33_jpg__3_documents__3_total_pages__1B7A6D8E.png

After cleaning the surfaces, Brent sampled tissue.

2015-08-11_09_35_11_jpg__7_documents__7_total_pages__1B7A6DB8.png

We started out started out targetting the foot and adductor muscles. These tissues were steriley removed and then rinsed in 1% bleach, followed by Nanopure water. This tissue will be used for genome sequencing as we predict least amount of associated taxa.

IMG_2959_JPG__4_documents__4_total_pages__1B7AA359.png

Remaining tissues were taken, primarily for RNA-seq and divided into two boxes.

2boxes_1B7AA416.png

Tubes were labeled on cap with tissue type.
Here is what Box 1 looks like.

tissue-layout_1B7AA56F.png

Box 2 looks the same however it does not have a heart or style sample.

The only surpise was in sampling, labial palps were identified after we had already sampled a pair.

Share

Second look at Geoduck transcriptome

Last week I popped out a quick assembly and annotation on our geoduck gonadal transcriptome. A second assembly was also done using Trinity.


Updates
August 3 – Confirmed // in file location had no impact on assembly.
July 14 – TransDecoder protein annotations
10:40am – added TransDecoder results
10:29am – added Stats via Trinity


Trinity.pl 
--seqType fq 
-JM 24G 
--left /Volumes/web/cnidarian/Geo_Pool_F_GGCTAC_L006_R1_001_val_1.fq /Volumes/web/cnidarian/Geo_Pool_M_CTTGTA_L006_R1_001_val_1.fq 
--right /Volumes/web/cnidarian//Geo_Pool_F_GGCTAC_L006_R2_001_val_2.fq /Volumes/web/cnidarian//Geo_Pool_M_CTTGTA_L006_R2_001_val_2.fq 
--CPU 16 

trinity_out_dir_1B54203C.png

Output

0:999   127840
1000:1999   18164
2000:2999   5321
3000:3999   1817
4000:4999   762
5000:5999   291
6000:6999   135
7000:7999   73
8000:8999   22
9000:9999   29
10000:10999     4
11000:11999     5
12000:12999     3
13000:13999     4
14000:14999     4
15000:15999     3
16000:16999     0
17000:17999     2
18000:18999     1

Total length of sequence:   101862868 bp
Total number of sequences:  154480
N25 stats:          25% of total sequence length is contained in the 8095 sequences &gt;= 2045 bp
N50 stats:          50% of total sequence length is contained in the 26158 sequences &gt;= 1014 bp
N75 stats:          75% of total sequence length is contained in the 64574 sequences &gt;= 446 bp
Total GC count:         37657770 bp
GC %:               36.97 %
hummingbird:Geo-trinity steven$ /Users/gilesg/compile/trinityrnaseq_r20131110/util/TrinityStats.pl /Volumes/web/cnidarian/Geo-trinity/trinity_out_dir/Trinity.fasta 


################################
## Counts of transcripts, etc.
################################
Total trinity transcripts:  154480
Total trinity components:   100155
Percent GC: 36.97

########################################
Stats based on ALL transcript contigs:
########################################

    Contig N10: 3444
    Contig N20: 2385
    Contig N30: 1766
    Contig N40: 1343
    Contig N50: 1014

    Median contig length: 371
    Average contig: 659.39
    Total assembled bases: 101862868


#####################################################
## Stats based on ONLY LONGEST ISOFORM per COMPONENT:
#####################################################

    Contig N10: 2999
    Contig N20: 2026
    Contig N30: 1462
    Contig N40: 1067
    Contig N50: 768

    Median contig length: 321
    Average contig: 553.88
    Total assembled bases: 55473621

Rerunning to see if double slash was a problem- did not see anything in error. Also running TransDecoder


TransDecoder Results

Ran the following

/Users/gilesg/compile/trinityrnaseq_r20131110/trinity-plugins/TransDecoder_r20131110/TransDecoder -t  /Volumes/web/cnidarian/Geo-trinity/trinity_out_dir/Trinity.fasta

This provided a peptide file with 36003 sequences.

!head /Volumes/web-1/cnidarian/Geo-trinity/Trinity.fasta.transdecoder.pep

>cds.comp100047_c0_seq2|m.5982 comp100047_c0_seq2|g.5982 ORF comp100047_c0_seq2|g.5982 comp100047_c0_seq2|m.5982 type:internal len:142 (-) comp100047_c0_seq2:3-425(-)
NAECRDLYKIFTQILSVRSQEGKIVIPDEFATKIRNWLGNKEELFKEAHNQKIITFYNEY
TREENTFNPIRGKRPMSVPDMPERKYIDQLSRKTQSQCDFCKYKTFTAEDTFGRIDSNFS
CSASNAFKLDHWHALFLLKTH


Running blastp on Trinity.fasta.transdecoder.pep

!blastp 
-query /Volumes/web/cnidarian/Geo-trinity/Trinity.fasta.transdecoder.pep 
-db /usr/local/bioinformatics/dbs/uniprot_sprot.fasta 
-evalue 1e-5 
-max_target_seqs 1 
-max_hsps 1 
-outfmt 6 
-num_threads 4 
-out /Volumes/web/cnidarian/Geo-trinity/Trinity.fasta.transdecoder.pep-blastp-uniprot-2.out

results: http://eagle.fish.washington.edu/cnidarian/Geo-trinity/Trinity.fasta.transdecoder.pep-blastp-uniprot-2.out

Share