Tag Archives: rna-seq

Genes (RNA-Seq) on Oly Genome

We currently have a version (0.0.2) of the Ostrea lurida genome on CoGe. This is 38 scaffolds greater than 80k bp. Below is an effort to map gonad RNA-seq data to said genome.

Two male gonad and two female libraries were mapped to the genome using TopHat in Cyverse Discovery Environment.

Through the steps…



I moved the data in Discovery Environment to coge_data directory.


Will see what Expression Analysis does…


Some output


This created two files and corresponding tracks: read depth and BAM alignment


Will crank out other three libraries and soon will work on rough annotation.


Getting back on tracks

Yesterday I uploaded v0.0.1 of the Geoduck genome to CoGe.

Now I want to start adding tracks. To do this I used CLC to create RNA-seq tracks from our male and female gonad transcriptome data.

As would be expected only a small amount of reads mapped. This is as we are limiting the genome to the 22 scaffolds with length > 100k.



One thing to point out (and will have to be followed up on) is that many more Female reads mapped back.

I took the Reads data and exported to BAM.

Then uploaded to CoGe.
I called this Version 1, and interestingly I got some cool options.. so I selected them.


This included saving as a Notebook.


This was Finished in less than 5 minutes!

The SNP view.

Voila – we have it in a Browser.

and you can zoom in

Here we have a Notebook view
It is now public, though not quite sure if there is a url.

Everything is public so please give it a look / twirl.


RNAseq Data Receipt – Geoduck Gonad RNA 100bp PE Illumina

Received notification that the samples sent on 20150601 for RNAseq were completed.

Downloaded the following files from the GENEWIZ servers using FileZilla FTP and stored them on our server (owl/web/nightingales/P_generosa):


Generated md5 checksums for each file:

$for i in *; do md5 $i >> checksums.md5; done

Made a readme.md file for the directory.


First steps at an aggregated view of all DNA methylation data (updated)

Seems like I have gotten close (see here) but do not have a canonical IGV session that has all of our DNA methylation data. The goal here is to generate such a product (and publish, so I do not lose it).

All data is publicly available at


see also data on Figshare


July 2, 2015 – added Heat Shock experiment alternative splice track
June 26, 2015 – add link to Figshare version
June 26, 2015 – updated Archive.zip
June 26, 2015 – added numerous array tracks from heat stress array experiment including 3+ tracks.
June 26, 2015 – added new track from heat stress – Heat-multi-individual-dmr.bed
June 22, 2015 – updated Archive.zip
June 22, 2015 – updated MBD-seq track gills (no bisulfite treatment) to use unique mapping (see also [this](MBD-seq track gills (no bisulfite treatment))
June 22, 2015 – Updated EE2 linkout to go to Github
June 22, 2015 – Corrected error in labelling EE2 experiment tracks
June 15, 2015 – added MBD-seq track gills (no bisulfite treatment)
June 15, 2015 – added larval pesticide treatment tracks (bisulfite treatment)
June 15, 2015 – new IGV screenshot
June 15, 2015 – added HS-Cuffdiff_geneexp.sig.gtf (differentially expressed genes from heat-shock)




FileID Description Links
Crassostrea_gigas.GCA_000297895.1.26.gtf gtf ftp
MBD-Gill-meth MBD enriched DNA library alignment paper, info
BiGill_CpG_methylation gill methylation 5x (MBD-BS, hi output) paper
BiGill_exon_clc_rpkm Corresponding exon-specific gene expression paper
BiGo_CpG_methylation male gamete methylation 5x (hi output) paper
M1 male gamete methylation 5x preprint
M3 male gamete methylation 5x preprint
T1D3 72hpf larvae from M1 methylation 5x preprint
T1D5 120hpf larvae from M1 methylation 5x preprint
T3D3 72hpf larvae from M3 methylation 5x preprint
T3D5 120hpf larvae from M3 methylation 5x preprint
Heat-multi-individual-dmr.bed Heat Stress (13 locations) common signal notebook
2M_3plusmerge_Hyper.bed merging adj probes to single interval notebook
2M_3plusmerge_Hypo.bed merging adj probes to single interval notebook
4M_3plusmerge_Hyper.bed merging adj probes to single interval notebook
4M_3plusmerge_Hypo.bed merging adj probes to single interval notebook
6M_3plusmerge_Hyper.bed merging adj probes to single interval notebook
6M_3plusmerge_Hypo.bed merging adj probes to single interval notebook
2M_Hyper_3plusAdjactentProbes.gff 3+ adjacent probes notebook
2M_Hypo_3plusAdjactentProbes.gff 3+ adjacent probes notebook
4M_Hyper_3plusAdjactentProbes.gff 3+ adjacent probes notebook
4M_Hypo_3plusAdjactentProbes.gff 3+ adjacent probes notebook
6M_Hyper_3plusAdjactentProbes.gff 3+ adjacent probes notebook
6M_Hypo_3plusAdjactentProbes.gff 3+ adjacent probes notebook
2M_sig Heat stress DMRs (array), ind.#2 notebook, draft
4M_sig Heat stress DMRs (array), ind.#4 notebook, draft
6M_sig Heat stress DMRs (array), ind.#6 notebook, draft
HS-Cuffdiff_geneexp.sig.gtf Heat stress differentially expressed genes notebook
HS-Cuffdiff_altsplice.bed Heat stress alternatively spliced genes notebook
2M.bedgraph.tdf RNA-seq from ind.#2 above – pretreament notebook, draft
4M.bedgraph.tdf RNA-seq from ind.#4 above – pretreament notebook, draft
6M.bedgraph.tdf RNA-seq from ind.#6 above – pretreament notebook, draft
2M-HS.bedgraph.tdf RNA-seq from ind.#2 above – post-heatshock notebook, draft
4M-HS.bedgraph.tdf RNA-seq from ind.#4 above – post-heatshock notebook, draft
6M-HS.bedgraph.tdf RNA-seq from ind.#6 above – post-heatshock notebook, draft
mgaveryDMRs_112212.gff EE2 exposure DMRs (array) paper
A01.smoothed EE2 exposure array data – input versus input paper
A02.smoothed EE2 exposure array data – EE2 vs control paper
A03.smoothed EE2 exposure array data – EE2 vs control (dyeswap) paper
YE_mixHYPER.bed DMRs in pesticide exposed larvae (hypermethylated)
YE_mixHYPO.bed DMRs in pesticide exposed larvae (hypomethylated)
YE_mix_22smCG3x larvae (mix pesticide exposed) methylation
YE_control_22smCG3x larvae (control) methylation


anyone should be able to render this in IGV with this session file:


This work was supported in part by the National Science Foundation (NSF) under Grant Number 1158119 awarded to SR Roberts


Sample Submission – Geoduck Gonad for RNA-seq

Prepared two pools of geoduck RNA for RNA-seq (Illumina HiSeq2500, 100bp, PE) with GENEWIZ, Inc.

I pooled a set of female and a set of male RNAs that had been selected by Steven based on the Bioanalyzer results from Friday.

The female RNA pool used 210ng of each sample, with the exception being sample #08. This sample used 630ng. The reason for this was due to the fact that there weren’t any other female samples to use from this developmental time point. The two other developmental time points each had three samples contributing to the pool. So, three times the quantity of the other individual samples was used to help equalize the time point contribution to the pooled sample. Additionally, 630ng used the entirety of sample #08.

The male RNA pool used 315ng of each sample. This number differs from the 210ng used for the female RNAs so that the two pools would end up with the same total quantity of RNA. However, now that I’ve typed this, this doesn’t matter since the libraries will be equalized before being run on the Illumina HiSeq2500. Oh well. As long as each sample in each pool contributed to the total amount of RNA, then it’s all good.

The two pools were shipped O/N on dry ice.

  • Geo_pool_M
  • Geo_pool_F

Calculations (Google Sheet): 20150601_Geoduck_GENEWIZ_calcs


Examing HTS data upon arrival

Data is on public server http://owl.fish.washington.edu/nightingales/P_helianthoides/

In [2]:
from IPython.display import HTML
HTML('<iframe src=http://owl.fish.washington.edu/nightingales/P_helianthoides/ width=700 height=450></iframe>')

FastQC IPlant Option

See FastQC documentation for interpretation of results.
Upload fastq.gz files to IPlant (likely already done)


FastQC Local Option

FastQC can also be downloaded and run locally


In [41]:
In [43]:
from IPython.display import HTML
HTML('<iframe src= 
width=700 height=450></iframe>')
In []:



RNA-Seq – Sea Star Data Download

Received RNA-seq data from Cornell. They provided a convenient download script for retrieving all the data files at one time (a bash script containing a series of wget commands with each individual file’s URL), which is faster/easier than performing individual wget commands for each individual file and faster/easier then using the Synology “Download Station” app when so many URLs are involved.

Here’s the script (download.sh) that was provided:

wget -q -c -O 3291_5903_10007_H94MGADXX_V_CF71_ATCACG_R2.fastq.gz http://cbsuapps.tc.cornell.edu/Sequencing/showseqfile.aspx?mode=http&amp;cntrl=1160641846&amp;refid=17091
wget -q -c -O 3291_5903_10007_H94MGADXX_V_CF71_ATCACG_R1.fastq.gz http://cbsuapps.tc.cornell.edu/Sequencing/showseqfile.aspx?mode=http&amp;cntrl=505010539&amp;refid=17092
wget -q -c -O 3291_5903_10008_H94MGADXX_V_CF34_CGATGT_R1.fastq.gz http://cbsuapps.tc.cornell.edu/Sequencing/showseqfile.aspx?mode=http&amp;cntrl=636513375&amp;refid=17093
wget -q -c -O 3291_5903_10008_H94MGADXX_V_CF34_CGATGT_R2.fastq.gz http://cbsuapps.tc.cornell.edu/Sequencing/showseqfile.aspx?mode=http&amp;cntrl=1472734408&amp;refid=17094
wget -q -c -O 3291_5903_10009_H94MGADXX_V_CF26_TTAGGC_R2.fastq.gz http://cbsuapps.tc.cornell.edu/Sequencing/showseqfile.aspx?mode=http&amp;cntrl=948605937&amp;refid=17095
wget -q -c -O 3291_5903_10009_H94MGADXX_V_CF26_TTAGGC_R1.fastq.gz http://cbsuapps.tc.cornell.edu/Sequencing/showseqfile.aspx?mode=http&amp;cntrl=1810346594&amp;refid=17096
wget -q -c -O 3291_5903_10010_H94MGADXX_HK_CF2_TGACCA_R2.fastq.gz http://cbsuapps.tc.cornell.edu/Sequencing/showseqfile.aspx?mode=http&amp;cntrl=424477466&amp;refid=17097
wget -q -c -O 3291_5903_10010_H94MGADXX_HK_CF2_TGACCA_R1.fastq.gz http://cbsuapps.tc.cornell.edu/Sequencing/showseqfile.aspx?mode=http&amp;cntrl=630586816&amp;refid=17098
wget -q -c -O 3291_5903_10011_H94MGADXX_HK_CF35_ACAGTG_R1.fastq.gz http://cbsuapps.tc.cornell.edu/Sequencing/showseqfile.aspx?mode=http&amp;cntrl=1392201335&amp;refid=17099
wget -q -c -O 3291_5903_10011_H94MGADXX_HK_CF35_ACAGTG_R2.fastq.gz http://cbsuapps.tc.cornell.edu/Sequencing/showseqfile.aspx?mode=http&amp;cntrl=1598310685&amp;refid=17100
wget -q -c -O 3291_5903_10012_H94MGADXX_HK_CF70_GCCAAT_R1.fastq.gz http://cbsuapps.tc.cornell.edu/Sequencing/showseqfile.aspx?mode=http&amp;cntrl=868072864&amp;refid=17101
wget -q -c -O 3291_5903_10012_H94MGADXX_HK_CF70_GCCAAT_R2.fastq.gz http://cbsuapps.tc.cornell.edu/Sequencing/showseqfile.aspx?mode=http&amp;cntrl=1074182214&amp;refid=17102

This is a bash script. However, for the most direct method of downloading these on our Synology server, we need the script to be an ash script. So, just modify the first line of the script to say “#!/bin/ash” instead of “#!/bin/bash”. Then, I placed the script in the target directory for our files, SSH’d into our Synology (Eagle), changed to the directory where I placed our script (Eagle/web/whale/SeaStarRNASeq) and then ran the script (./download.sh).


RNA Isolation – Colleen Sea Star (Pycnopodia) Coelomycete Samples

Isolated RNA from the following samples (provided by Colleen Burge):

  • Bio 26 (a LARGE amount of tissue/debris in this sample!)
  • CF 2
  • CF 3
  • CF 17
  • CF 34
  • CF 35
  • CF 70
  • CF 71

Samples were initially flash frozen and then stored @ -80C (no preservatives used). No visible cells/tissue in all samples, except Bio 26. Samples were homogenized in 1mL TriReagent. Used the Direct-zol RNA MiniPrep Kit (ZymoResearch) according to the manufacturer’s protocol (including on-column DNase I procedure) for the remainder of the isolation. Eluted with 50uL of 0.1%DEPC-treated H2O and spec’d on NanoDrop1000.

Samples were stored in Shellfish RNA Box #5.


Samples CF 3 and CF 17 likely have insufficient total RNA for sequencing at Cornell (200ng minimum required).

UPDATE 20140514 – CF2, CF34, CF35, CF70, CF71 sent to Cornell for Illumina RNA-seq on 20140514