Normally I would not consider a week in review post, but so little progress was made (better than nothing) I thought I would give it a shot. Monday and Tuesday was in Oregon giving a seminar “Genomics on the Half Shell: Environmental Epigenetics, Open Science, and the Oyster“. (Yes, I will use that as an excuse).
On the epigenetics and ocean acidification front I think we have a way forward. In short the following will get 32% mapping.
!/Users/Shared/Apps/bsmap-2.74/bsmap -a 20150506_trimmed_2212_lane2_CTTGTA_L002_R1_001.fastq.gz -d /Users/Shared/data/oyster.v9_90.fa -o tmp-4.sam -n 1 -L 30 -p 8 -v 5
A hurdle overcome in this effort included getting rid of more artifact sequence. Sam cleaned up a file to get us some straight lines then I invoked the
-L to get rid of the “G rise”.
The second big issue was understanding (Thanks to Mac!) that I needed to pay attention to the mapping strand information
-n [0,1] set mapping strand information. default: 0
-n 0: only map to 2 forward strands, i.e. BSW(++) and BSC(-+),
for PE sequencing, map read#1 to ++ and -+, read#2 to +- and –.
-n 1: map SE or PE reads to all 4 strands, i.e. ++, +-, -+, —
With that and flexing the
-v, we can get mapping that can then be analyzed. Will wait on pulling the trigger until we hear from the NSF on going for a full proposal. In the mean time I would still like to know what is going on in those first 30 bp.
While working on a chapter I came across the diversion of trying to identify the gene sequences that were analogous to the Dheilly sex specific genes.
Dheilly, Nolwenn M.; Lelong, Christophe; Huvet, Arnaud; Kellner, Kristell; Dubos, Marie-Pierre; Riviere, Guillaume; Boudry, Pierre; Favrel, Pascal (2012): Gametogenesis in the Pacific Oyster Crassostrea gigas: A Microarrays-Based Analysis Identifies Sex and Stage Specific Genes. File_S1.xls. PLOS ONE.
10.1371/journal.pone.0036353.s001. Retrieved 14:28, May 08, 2015 (GMT).
In NCBI I was able to get the details of the array platform
This file was loaded up to the beta version of SQLShare (http://sqlshare.uw.edu/).
And with a few joins…
SELECT * FROM [email@example.com].[Dheilly-File_S1_1]s left join [firstname.lastname@example.org].[Dheilly-array-design]array on s.[Genbank Acc]=array.GB_ACC left join [email@example.com].[table_Roberts_Sigenae6_transcriptome.tab]six on array.ContigName=six.Column1
and a little more work I can get a fasta for Blast purposes.
Though in a little of hindsight maybe a better approach would be to use the probe sequences and see how they match up with the Ensembl version of the oyster genome.
And to prove I did not completely waste the week I am considering how to addresss our reviews for the “Up in Arms” paper. In another means to assess full transcriptome I have generated some data by comparing
Still need to take this forward from…