# DNA Methylation Analysis – Olympia Oyster Whole Genome BSseq Bismark Pipeline MethylKit Comparison

I previously ran two variations on the Bismark analysis for our Olympia oyster whole genome bisulfite sequencing data:

I followed this up using the MethylKit R package to identify differentially modified loci (DML), based on differing amounts of coverage (1x, 3x, 5x, & 10x) and percent methylation differences between the two groups of oysters (25%, 50%, & 75%).

Both sets of analyses were documented in R Projects:

##### RESULTS

BedGraphs (1x coverage, 25% diff in methylation):

###### “Relaxed” Bismark settings:

The BedGraph outputs from the least stringent coverage/percent difference in methylation for both Bismark pipelines yield suprisingly low numbers of DML.

They yield 22 and 21 DML, respectively. Of course, more stringent BedGraphs have fewer DML, but may be more believable due to having a more robust set of data.

Interestingly, the two analyses reveal that a single contig contains the majority of DML, all within a 1000bp range.

Will continue to examine this data by examining Bismark BedGraphs in IGV, and running some additional MethylKit analysis looking at differentially modified regions(DMRs) to see what we can gleen from this.

# DNA Methylation Analysis – Olympia Oyster Whole Genome BSseq Bismark Pipeline Comparison

Ran Bismark using our high performance computing (HPC) node, Mox, with two different bowtie2 settings:

1. Default settings

2. –score_min L,0,-0.6

The second setting is a bit less stringent than the default settings and should result in a higher percentage of reads mapping. However, not entirely sure what the actual implications will be (if any) for interpreting the resulting data.

Input data was previously trimmed per Bismark’s recommendation for Illumina TruSeq libraries (TrimGalore! 5′ 10bp):

List of input files and Bismark configurations can be seen in the SLURM scripts:

Output folders:

# DNA Methylation Analysis – Olympia oyster BSseq MethylKit Analysis

##### I’m posting this for posterity and to provide an overview of what (and whatnot) to do. Plus, this has a good R script for using MethylKit that can be used for subsequent analyses.

The goal of this analysis was to compare the methylation profiles of Olympia oysters originating from a common population (Fidalgo Bay) that were raised in two different locations (Fidalgo Bay & Oyster Bay).

An overview of the experiment can be viewed here:

I previously ran all of Olympia oyster DNA methylation sequencing data through the Bismark pipeline, and then processed them using the MethylKit R library.

First mistake (Bismark):

• Trimmed FastQ files “incorrectly”.

Bismark provides an excellent user guide and provides a handy table on how to decide on trimming parameters, but I mistakenly trimmed these according to the recommendations for a different library preparation technique. I trimmed based on the Zymo Pico-Methyl Kit (which was used for the other group of data that I processed simultaneously), instead of the TruSeq library prep.

So, “incorrectly” isn’t necessarily the proper term here. The analysis can still be used, however, it’s likely that the excessive trimming results in reducing sequencing coverage, and, in turn, making the downstream analysis result in a highly conservative output. Thus, the data isn’t wrong or bad, it is just very limited.

And, this leads to the second mistake (Bismark):

• Bowtie alignment score too strict

There’s a bit of a weird “battle” between Bismark and bowtie2. Bismark uses bowtie2 for generating alignments, but bowtie2’s default cutoff score overrides Bismark’s. So, to adjust the score value, you have to explicitly add the scoring parameters to your Bismark parameters during the alignment step. I did not do this.

Again, it’s not wrong, per se, but leads to a significantly limited set of data in the final analysis.

The data were analyzed based on a minimum of:

• 3x coverage

• 25% difference in methylation

#### RESULTS:

Methylkit analysis (R project; GitHub):

BedGraph file (BED):

The analysis resulted in a total of seven (yes, 7) differentially methylated loci (DML) between the two groups. It was this result that made Steven and me revisit the initial Bismark analysis. He has done this previously (but differently) and gotten significantly greater numbers of DML.

Knowing all of this, I will re-trim the data and adjust Bismark alignment score thresholds and then re-analyze with MethylKit.

Regardless here’re some plots to add some visual flair to this notebook entry (these, and more, are available in the GitHub repo):

# DNA Methylation Analysis – Bismark Pipeline on All Olympia oyster BSseq Datasets

Bismark analysis of all of our current Olympia oyster (Ostrea lurida) DNA methylation high-throughput sequencing data.

Analysis was run on Emu (Ubuntu 16.04LTS, Apple Xserve). The primary analysis took ~14 days to complete.

All operations are documented in a Jupyter notebook (GitHub):

Genome used:

Input files ( see Olympia oyster Genomic GitHub wiki for more info ):

##### WG BSseq of Fidalgo Bay offspring grown in Fidalgo Bay & Oyster Bay
• 1_ATCACG_L001_R1_001.fastq.gz

• 2_CGATGT_L001_R1_001.fastq.gz

• 3_TTAGGC_L001_R1_001.fastq.gz

• 4_TGACCA_L001_R1_001.fastq.gz

• 5_ACAGTG_L001_R1_001.fastq.gz

• 6_GCCAAT_L001_R1_001.fastq.gz

• 7_CAGATC_L001_R1_001.fastq.gz

• 8_ACTTGA_L001_R1_001.fastq.gz

##### MBDseq of two populations (Hood Canal & Oyster Bay) grown in Clam Bay
• zr1394_10_s456.fastq.gz

• zr1394_11_s456.fastq.gz

• zr1394_12_s456.fastq.gz

• zr1394_13_s456.fastq.gz

• zr1394_14_s456.fastq.gz

• zr1394_15_s456.fastq.gz

• zr1394_16_s456.fastq.gz

• zr1394_17_s456.fastq.gz

• zr1394_18_s456.fastq.gz

• zr1394_1_s456.fastq.gz

• zr1394_2_s456.fastq.gz

• zr1394_3_s456.fastq.gz

• zr1394_4_s456.fastq.gz

• zr1394_5_s456.fastq.gz

• zr1394_6_s456.fastq.gz

• zr1394_7_s456.fastq.gz

• zr1394_8_s456.fastq.gz

• zr1394_9_s456.fastq.gz

#### RESULTS:

With Bismark complete, these two sets of analyses can now be looked into further (and separately, as they are separate experiments) using things like MethylKit (R package) and
the Integrative Genomics Viewer (IGV).

Output folder:

Bismark Summary Report:

Individual Sample Reports:

# DNA Methylation Quantification – Acropora cervicornis (Staghorn coral) DNA from Javier Casariego (FIU)

Used the MethylFlash Methylated DNA Quantification Kit (Colorimetric) from Epigentek to quantify methylation in these coral DNA samples.

All samples were run in duplicate <em>except</em> 2h Block 1 due to insufficient DNA.

The following samples were used in a 1:10 dilution (2uL DNA : 18uL NanoPure H2O), due to their relatively high concentrations, to ensure accurate pipetting:

• 72h Block 4
• D14 Block 1
• D14 Block 2
• D14 Block 3
• D14 Block 4
• D14 Block 5
• D14 Block 6
• D14 Block 8
• D14 Block 10

All samples were diluted to a final concentration of 9.645ng/uL (154.24ng total; 17.6uL) in NanoPure water, which is equal to 77.12ng of DNA per assay replicate. These numbers were chosen based off of the sample with the lowest concentration.

The following samples were used in their entirety:

• 2h Block 8
• D35 Block 8

The spreadsheet became overly complicated because I initially forgot to account for the need to run each sample in duplicate.

The kit reagent dilutions were as follows:

• Diluted ME1: 52mL of ME1 + 464mL of <em>distilled</em> water
• Diluted ME4: 10uL of ME4 + 10uL of TE Buffer (pH=8.0; made by me on 20130408).
• Standard curve: Prepped per instruction manual, with double volumes for two plates.
• Diluted ME5: 50uL/well x 152well = 7600uL; 7600uL/1000 = 7.6uL; 7.6uL ME5 + 7592.4uL Diluted ME1
• Diluted ME6: 50uL/well x 152well = 7600uL; 7600uL/2000 = 3.8uL; 3.8uL ME6 + 7596.2uL Diluted ME1
• Diluted ME7: 50uL/well x 152well = 7600uL; 7600uL/5000 = 1.52uL; 1.52uL ME7 + 7598.48uL Diluted ME1

All diluted solutions were stored on ice for duration of procedure.

The remaining Diluted ME1 solution was stored at 4C (FTR 209), and is stable for 6 months, per the manufacturer’s instructions.

See the Results section below for plate layouts.

Plates were read at 450nm on the Seeb Lab Victor 1420 Plate Reader (Perkin Elmer) and the amount of DNA methylation was determined.

Results:

Individual sample methylation quantification (Google Sheet): A.cervicornis_DNA_Extractions(May_2017).xlsx

I’m not familiar with the experimental design, so I’m not going to spend time handling any of the in-depth analysis at this point in time. However, here’s the background on how methylation quantification and percent methylation were determined.

1. Mean absorbance (450nm) was determined for all samples and standard curve samples. It’s important to note that the standard deviation between replicates was not evaluated and there appears to be consistent variability between samples, but I’m not certain how much variation is “acceptable” with and assay of this nature.

2. The mean absorbance of the standard curve samples were plotted against their corresponding DNA amounts and a linear trendline was fitted to the points.

3. Per the manufacturer’s recommendations, the four points (including the zero point) that yielded the best linear fit (i.e. best R^2 value) were used and the slope of best fit line for those four points was determined.

4. This slope was then utilized in the equation provided by the manufacturer (see pg. 8 of the MethylFlash Kit manual).

# DNA Methylation Quantification – Coral DNA from Jose M. Eirin-Lopez (Florida International University)

Ran the coral DNA I quantified on 20160630 through the MethylFlash Methylated DNA Quantification Kit [Colorimetric] (Epigentek) kit to quantify global methylation.

Used 100ng of DNA per 8uL per replicate (x2 replicates = total 200ng in 16uL). Calcs are here (Google Sheet): 20160705_coral_DNA_methylation_calcs

Manufacturer’s protocol was followed.

Dilutions of kit reagents:

ME5 (1:1000) 2.6uL ME5 + 2597.4uL diluted ME1

ME6 (1:2000) 1.3uL ME6 + 2598.7uL diluted ME1

ME7 (1:5000) 0.52uL ME7 + 2599.48uL diluted ME1

Samples were quantified on the Seeb’s plate reader @ 450nm  (Wallac 1420 Victor 2  [Perkin Elmer])

Results:

 sample treatment 5-mC(ng) H1_1 nitrogen 0.8712248853 H1_10 nitrogen 0.6917168368 H1_12 control 0.2738478893 H1_5 nitrogen & phosphorous 0.9663585942 H1_6 control 0.6494783825 H1_8 nitrogen & phosphorous 0.4244913398 H24_1 nitrogen 0.372603297 H24_10 nitrogen 0.4237237786 H24_12 control 0.5350511937 H24_5 nitrogen & phosphorous 0.1495527697 H24_6 control 0.2291900804 H24_8 nitrogen & phosphorous 0.2213437801 H5_1 nitrogen -0.1233169902 H5_10 nitrogen 0.6997668774 H5_12 control 0.2307000493 H5_5 nitrogen & phosphorous -0.07790933048 H5_6 control 0.4562401662 H5_8 nitrogen & phosphorous 0.5949647121

Overall, it’s difficult to really interpret these results. I believe the data is a time course (e.g. H5 = hour 5, H24 = hour 24). Additionally, looking at treatments, there appear to be replicates, but it’s not clear what type of replicates they are (i.e. technical or biological). Generally, it seems like the control samples have lower quantities of methylated DNA than the treated samples. However, this doesn’t hold true for all three of the groups.

And, not that it really matters, but I don’t even know what species this is…

In any case, this was an attempt to gather some preliminary data for a grant that Steven is attempting to put together, so the original experiment and the subsequent data aren’t as robust as one would expect for a full-blown research project.

# Epigenetic variation of two populations grown at common site

In a different experiment compared to when Fidalgo siblings were outplanted at two sites, we also examined Hood Canal (HC) and Oyster Bay (SS/South Sound) grown at Clam Bay (Manchester). Descriptor.

These were the oysters Katherine Silliman spawned in the summer of 2015 and represent seed Jake outplanted years ago.

This was run against the BGI scaffolds >10k.

The results are quite interesting.

The full notebook can be found at https://github.com/sr320/nb-2016/blob/master/O_lurida/BSMAP-06-BGIv001.ipynb.

# Fidalgo offspring at two locations

We carried out whole genome BS-Seq on siblings outplanted out at two sites: Fidalgo Bay (home) and Oyster Bay. Four individuals from each locale were examined.

A running description of the data is available @ https://github.com/RobertsLab/project-olympia.oyster-genomic/wiki/Whole-genome-BSseq-December-2015.

I need to look back to a genome to analyze this. We did some PacBio sequencing a while ago.
– http://nbviewer.jupyter.org/github/sr320/ipython_nb/blob/master/OlyO_PacBio.ipynb

3058 of these reads were >10k bp: http://eagle.fish.washington.edu/cnidarian/OlyO_Pat_PacBio_10k.fa

Those 3058 reads were nicely assembled into 553 contigs: http://eagle.fish.washington.edu/cnidarian/OlyO_Pat_PacBio_10k_contigs.fa

Step forward a bit and all 47475 reads were assembled into the 5362 contigs known as OlyO_Pat_v02.fa http://owl.fish.washington.edu/halfshell/OlyO_Pat_v02.fa

The latter (v02) was used to map the 8 libraries. Roughly getting about 8% mapping

And with a little filtering

Note that awk script filtered for 10x coverage! this could be altered.

and R have an intriguing relationship

## With BGI Draft Genome

Following the same workflow with the BGIv1 scaffolds >10k bp have about 16% or reads map.

3 fold coverage

again, making sure there is 10x coverage at a given CG loci
we get

Much weaker if we allow only 3x coverage at a given CG loci

and the bit of R code

setwd("/Volumes/web-1/halfshell/working-directory/16-04-05")

library(methylKit)

file.list ‘mkfmt_2_CGATGT.txt’,
‘mkfmt_3_TTAGGC.txt’,
‘mkfmt_4_TGACCA.txt’,
‘mkfmt_5_ACAGTG.txt’,
‘mkfmt_6_GCCAAT.txt’,
‘mkfmt_7_CAGATC.txt’,
‘mkfmt_8_ACTTGA.txt’
)

meth<-unite(myobj)
nrow(meth)
getCorrelation(meth,plot=F)
hc PCA<-PCASamples(meth)

# Since you’ve been gone

Soon after Ensenada I went to Chili, SICB, and PAG (in that order). The new year is often of time to let go of lingering projects, and likely I will be doing that soon. But to bring a few pending efforts to the forefront, so that I can analyze etc here is a bit of data that is (or soon will) be coming in.
Much of this is centered around the Ostrea lurida.

The first batch was 2bRAD data.

The full list of samples are here.

These raw data are here.

A quick fastqc….

We also now have a fresh set of MBD-BS.. now out for sequencing.
Pregame here

And just some plain old BS

Details