Ostrea lurida

# Genes (RNA-Seq) on Oly Genome

We currently have a version (0.0.2) of the Ostrea lurida genome on CoGe. This is 38 scaffolds greater than 80k bp. Below is an effort to map gonad RNA-seq data to said genome.

Two male gonad and two female libraries were mapped to the genome using TopHat in Cyverse Discovery Environment.

Through the steps…

I moved the data in Discovery Environment to coge_data directory.

Will see what Expression Analysis does…

Some output

This created two files and corresponding tracks: read depth and BAM alignment

Will crank out other three libraries and soon will work on rough annotation.

# Epigenetic variation of two populations grown at common site

In a different experiment compared to when Fidalgo siblings were outplanted at two sites, we also examined Hood Canal (HC) and Oyster Bay (SS/South Sound) grown at Clam Bay (Manchester). Descriptor.

These were the oysters Katherine Silliman spawned in the summer of 2015 and represent seed Jake outplanted years ago.

This was run against the BGI scaffolds >10k.

The results are quite interesting.

The full notebook can be found at https://github.com/sr320/nb-2016/blob/master/O_lurida/BSMAP-06-BGIv001.ipynb.

# Fidalgo offspring at two locations

We carried out whole genome BS-Seq on siblings outplanted out at two sites: Fidalgo Bay (home) and Oyster Bay. Four individuals from each locale were examined.

A running description of the data is available @ https://github.com/RobertsLab/project-olympia.oyster-genomic/wiki/Whole-genome-BSseq-December-2015.

I need to look back to a genome to analyze this. We did some PacBio sequencing a while ago.
– http://nbviewer.jupyter.org/github/sr320/ipython_nb/blob/master/OlyO_PacBio.ipynb

3058 of these reads were >10k bp: http://eagle.fish.washington.edu/cnidarian/OlyO_Pat_PacBio_10k.fa

Those 3058 reads were nicely assembled into 553 contigs: http://eagle.fish.washington.edu/cnidarian/OlyO_Pat_PacBio_10k_contigs.fa

Step forward a bit and all 47475 reads were assembled into the 5362 contigs known as OlyO_Pat_v02.fa http://owl.fish.washington.edu/halfshell/OlyO_Pat_v02.fa

The latter (v02) was used to map the 8 libraries. Roughly getting about 8% mapping

And with a little filtering

Note that awk script filtered for 10x coverage! this could be altered.

and R have an intriguing relationship

## With BGI Draft Genome

Following the same workflow with the BGIv1 scaffolds >10k bp have about 16% or reads map.

3 fold coverage

again, making sure there is 10x coverage at a given CG loci
we get

Much weaker if we allow only 3x coverage at a given CG loci

and the bit of R code

setwd("/Volumes/web-1/halfshell/working-directory/16-04-05")

library(methylKit)

file.list ‘mkfmt_2_CGATGT.txt’,
‘mkfmt_3_TTAGGC.txt’,
‘mkfmt_4_TGACCA.txt’,
‘mkfmt_5_ACAGTG.txt’,
‘mkfmt_6_GCCAAT.txt’,
‘mkfmt_7_CAGATC.txt’,
‘mkfmt_8_ACTTGA.txt’
)

meth<-unite(myobj)
nrow(meth)
getCorrelation(meth,plot=F)
hc PCA<-PCASamples(meth)

# Data check on Oly BS-Seq samples

For the 12 samples

Select 4 samples from 1NF gel take 2
Select 4 samples from 2NF gel take 2

Select 2 from gel take 2 Lotterhos
M1
M2
M3

Select 2 from the following sent to Katie (do not have to run on gel)
NF2 14
NF2 6
NF2 18
NF2 15
NF2 17

## Short term will just check out the first 8.¶

These are samples outplanted at Oyster Bay and Fidalgo, and in both cases parents from Fidalgo.

The hypothesis is that Epigenetic pattern will differ (and we can attribute to Environment)

## Quick look at raw data¶

Sequencing Platform: Illumina HiSeq2500

Read Type/Length: 50bp single-end, single index

11_GGCTAC_L001_R1_001.fastq.gz    10933121

12_CTTGTA_L001_R1_001.fastq.gz    10816647

1_ATCACG_L001_R1_001.fastq.gz    9402890

2_CGATGT_L001_R1_001.fastq.gz    11954873

3_TTAGGC_L001_R1_001.fastq.gz    11817358

4_TGACCA_L001_R1_001.fastq.gz    11606618

5_ACAGTG_L001_R1_001.fastq.gz    12589609

6_GCCAAT_L001_R1_001.fastq.gz    12489766

7_CAGATC_L001_R1_001.fastq.gz    10295293

8_ACTTGA_L001_R1_001.fastq.gz    14374642

## Unzip¶

In [1]:
cd /Volumes/Histidine/hectocotylus/whole-BS-01

/Volumes/Histidine/hectocotylus/whole-BS-01

In [2]:
%%bash
for f in *.gz
do
STEM=$(basename "${f}" .gz)
gunzip -c "${f}" > /Volumes/Histidine/hectocotylus/whole-BS-01/fq/"${STEM}"
done


## FastQC¶

In [3]:
!/Applications/bioinfo/FastQC/fastqc
-o /Volumes/Histidine/hectocotylus/whole-BS-01/fq/
-t 4
/Volumes/Histidine/hectocotylus/whole-BS-01/fq/*

Started analysis of 1_ATCACG_L001_R1_001.fastq
Started analysis of 2_CGATGT_L001_R1_001.fastq
Started analysis of 3_TTAGGC_L001_R1_001.fastq
...


this unusual pattern seem to hold true..

In [ ]:

# Since you’ve been gone

Soon after Ensenada I went to Chili, SICB, and PAG (in that order). The new year is often of time to let go of lingering projects, and likely I will be doing that soon. But to bring a few pending efforts to the forefront, so that I can analyze etc here is a bit of data that is (or soon will) be coming in.
Much of this is centered around the Ostrea lurida.

The first batch was 2bRAD data.

The full list of samples are here.

These raw data are here.

A quick fastqc….

We also now have a fresh set of MBD-BS.. now out for sequencing.
Pregame here

And just some plain old BS

Details

# Running the numbers

Today I brought another 16 samples into the DNA extraction pipeline. These were subjected to 30 minute ProK digestion at 60C as opposed to overnight.

It became clear the number scheme was complicated thus I converted to 1-32. Will keep this running fwd and cross-referencing.

In the future the number scheme will be simpler, but here is the actual run down.

Approximately 300ul of aqueous solution was recovered and taken through the protocol by Sam.

First look seems pretty good.

Sam will run a gel tomorrow. Note the Katherine is a bit ahead of us, but we have something to compare to.

# Upon improving extractions

In an attempt to determine most efficient means to get some high quality DNA from the archived oyster samples – a suite of samples were started out with M1 buffer and ProK.

The first ‘unit’ to go through will be the April Dabob samples. For today, I took 16 samples from the Hood canal population (code 3H13-16; yellow tubes). Samples were homogenized with plastic mortar and the other 8 did not get a mortar poke.

All samples were vortexed and placed at 37C at 2pm. This was done with air incubator, non-shaking.

Tomorrow, an additional set of samples will undergo a shorter Proteinase K digestion, subjected to Mollusc DNA extraction kit to try to determine best way to scale up.

# A few outliers remaining

Integrating last bit of qPCR data into master datasheet. This includes 5 runs post 8/15.

likely error above – second EF1 was 18s test.

CARM
Looks nice, corrected…

Elong factor
Correction is fine, but mechanical reps have some issues

As noticed in last batch of analysis these need to be checked at raw data level

28s

Thoughts on normalizing gene…
Noting that one EF1 rep was thrown out for mechanical (see above). Here is a crude look at EF1, actin, and 28s respectively…

# Finishing out with the mechanical

Currently there is a pretty robust spreadsheet and over the past few days Jake has cranked through some reps to see how the oysters that were mechanically stressed hold up. Below is how these data are integrated.

Currently the 8-10 samples (yellow) have been skipped, but we might have a look.

First up is having a look at the new HSP 70 reps. The mechanical data still needs some better resolution. Hopefully teh 8-10 samples migh shed some light.

Next up is two more reps of PGEEP4.
Looks good, and given the doubling of reps we could easily drop ‘outlier’ runs and still have triplicates, tight triplicates.

GRB2… now good to go, with the first pair of reps dead on.

BMP2…. could use some help from the other mechanical stress runs

TLR….seemed like a relatively easy fix (besides no detection) in that just needed to correct for machine.

And the correction indicating the fact that expression was so low, only able to be detected by Opticon

The 8-15 runs had minimal control and temp samples with mechanical run in dups.

This needs a little carressing before integrating into data.
This should be in two columns with empty cells where no samples were run- in this order.

H_C_1
H_C_2
H_C_3
H_C_4
H_C_5
H_C_6
H_C_7
H_C_8
N_C_1
N_C_2
N_C_3
N_C_4
N_C_5
N_C_6
N_C_7
N_C_8
S_C_1
S_C_2
S_C_3
S_C_4
S_C_5
S_C_6
S_C_7
S_C_8
H_T_1
H_T_2
H_T_3
H_T_4
H_T_5
H_T_6
H_T_7
H_T_8
N_T_1
N_T_2
N_T_3
N_T_4
N_T_5
N_T_6
N_T_7
N_T_8
S_T_1
S_T_2
S_T_3
S_T_4
S_T_5
S_T_6
S_T_7
S_T_8
H_M_1
H_M_2
H_M_3
H_M_4
H_M_5
H_M_6
H_M_7
H_M_8
N_M_1
N_M_2
N_M_3
N_M_4
N_M_5
N_M_6
N_M_7
N_M_8
S_M_1
S_M_2
S_M_3
S_M_4
S_M_5
S_M_6
S_M_7
S_M_8


8-15 run update

Actin

Mechanical looks decent after correcting.

However taken together, bothersome the difference in crude expression levels.

Carm