Tag Archives: methylation

DNA Methylation Analysis – Olympia Oyster Whole Genome BSseq Bismark Pipeline MethylKit Comparison

I previously ran two variations on the Bismark analysis for our Olympia oyster whole genome bisulfite sequencing data:

I followed this up using the MethylKit R package to identify differentially modified loci (DML), based on differing amounts of coverage (1x, 3x, 5x, & 10x) and percent methylation differences between the two groups of oysters (25%, 50%, & 75%).

See the project wiki for experimental design info).

Both sets of analyses were documented in R Projects:

Default Bismark settings:
“Relaxed” Bismark settings

RESULTS

BedGraphs (1x coverage, 25% diff in methylation):

Default Bismark settings:
“Relaxed” Bismark settings:

The BedGraph outputs from the least stringent coverage/percent difference in methylation for both Bismark pipelines yield suprisingly low numbers of DML.

They yield 22 and 21 DML, respectively. Of course, more stringent BedGraphs have fewer DML, but may be more believable due to having a more robust set of data.

Interestingly, the two analyses reveal that a single contig contains the majority of DML, all within a 1000bp range.

Will continue to examine this data by examining Bismark BedGraphs in IGV, and running some additional MethylKit analysis looking at differentially modified regions(DMRs) to see what we can gleen from this.

Share

DNA Methylation Analysis – Olympia Oyster Whole Genome BSseq Bismark Pipeline Comparison

Ran Bismark using our high performance computing (HPC) node, Mox, with two different bowtie2 settings:

  1. Default settings

  2. –score_min L,0,-0.6

The second setting is a bit less stringent than the default settings and should result in a higher percentage of reads mapping. However, not entirely sure what the actual implications will be (if any) for interpreting the resulting data.

Input data was previously trimmed per Bismark’s recommendation for Illumina TruSeq libraries (TrimGalore! 5′ 10bp):

List of input files and Bismark configurations can be seen in the SLURM scripts:


RESULTS

Output folders:

Share

DNA Methylation Analysis – Olympia oyster BSseq MethylKit Analysis

NOTE: IMPORTANT CAVEATS – READ POST BEFORE USING DATA.
I’m posting this for posterity and to provide an overview of what (and whatnot) to do. Plus, this has a good R script for using MethylKit that can be used for subsequent analyses.

The goal of this analysis was to compare the methylation profiles of Olympia oysters originating from a common population (Fidalgo Bay) that were raised in two different locations (Fidalgo Bay & Oyster Bay).

An overview of the experiment can be viewed here:

I previously ran all of Olympia oyster DNA methylation sequencing data through the Bismark pipeline, and then processed them using the MethylKit R library.

First mistake (Bismark):

  • Trimmed FastQ files “incorrectly”.

Bismark provides an excellent user guide and provides a handy table on how to decide on trimming parameters, but I mistakenly trimmed these according to the recommendations for a different library preparation technique. I trimmed based on the Zymo Pico-Methyl Kit (which was used for the other group of data that I processed simultaneously), instead of the TruSeq library prep.

So, “incorrectly” isn’t necessarily the proper term here. The analysis can still be used, however, it’s likely that the excessive trimming results in reducing sequencing coverage, and, in turn, making the downstream analysis result in a highly conservative output. Thus, the data isn’t wrong or bad, it is just very limited.

And, this leads to the second mistake (Bismark):

  • Bowtie alignment score too strict

There’s a bit of a weird “battle” between Bismark and bowtie2. Bismark uses bowtie2 for generating alignments, but bowtie2’s default cutoff score overrides Bismark’s. So, to adjust the score value, you have to explicitly add the scoring parameters to your Bismark parameters during the alignment step. I did not do this.

Again, it’s not wrong, per se, but leads to a significantly limited set of data in the final analysis.

The data were analyzed based on a minimum of:

  • 3x coverage

  • 25% difference in methylation


RESULTS:

Methylkit analysis (R project; GitHub):

BedGraph file (BED):

The analysis resulted in a total of seven (yes, 7) differentially methylated loci (DML) between the two groups. It was this result that made Steven and me revisit the initial Bismark analysis. He has done this previously (but differently) and gotten significantly greater numbers of DML.

Knowing all of this, I will re-trim the data and adjust Bismark alignment score thresholds and then re-analyze with MethylKit.

Regardless here’re some plots to add some visual flair to this notebook entry (these, and more, are available in the GitHub repo):

CLUSTERING DENDROGRAM


PCA PLOT

Share

DNA Methylation Analysis – Bismark Pipeline on All Olympia oyster BSseq Datasets

Bismark analysis of all of our current Olympia oyster (Ostrea lurida) DNA methylation high-throughput sequencing data.

Analysis was run on Emu (Ubuntu 16.04LTS, Apple Xserve). The primary analysis took ~14 days to complete.

All operations are documented in a Jupyter notebook (GitHub):

Genome used:


Input files ( see Olympia oyster Genomic GitHub wiki for more info ):

WG BSseq of Fidalgo Bay offspring grown in Fidalgo Bay & Oyster Bay
  • 1_ATCACG_L001_R1_001.fastq.gz

  • 2_CGATGT_L001_R1_001.fastq.gz

  • 3_TTAGGC_L001_R1_001.fastq.gz

  • 4_TGACCA_L001_R1_001.fastq.gz

  • 5_ACAGTG_L001_R1_001.fastq.gz

  • 6_GCCAAT_L001_R1_001.fastq.gz

  • 7_CAGATC_L001_R1_001.fastq.gz

  • 8_ACTTGA_L001_R1_001.fastq.gz

MBDseq of two populations (Hood Canal & Oyster Bay) grown in Clam Bay
  • zr1394_10_s456.fastq.gz

  • zr1394_11_s456.fastq.gz

  • zr1394_12_s456.fastq.gz

  • zr1394_13_s456.fastq.gz

  • zr1394_14_s456.fastq.gz

  • zr1394_15_s456.fastq.gz

  • zr1394_16_s456.fastq.gz

  • zr1394_17_s456.fastq.gz

  • zr1394_18_s456.fastq.gz

  • zr1394_1_s456.fastq.gz

  • zr1394_2_s456.fastq.gz

  • zr1394_3_s456.fastq.gz

  • zr1394_4_s456.fastq.gz

  • zr1394_5_s456.fastq.gz

  • zr1394_6_s456.fastq.gz

  • zr1394_7_s456.fastq.gz

  • zr1394_8_s456.fastq.gz

  • zr1394_9_s456.fastq.gz


RESULTS:

With Bismark complete, these two sets of analyses can now be looked into further (and separately, as they are separate experiments) using things like MethylKit (R package) and
the Integrative Genomics Viewer (IGV).

Output folder:

Bismark Summary Report:

Individual Sample Reports:

Share

DNA Methylation Quantification – Acropora cervicornis (Staghorn coral) DNA from Javier Casariego (FIU)

Used the MethylFlash Methylated DNA Quantification Kit (Colorimetric) from Epigentek to quantify methylation in these coral DNA samples.

All samples were run in duplicate <em>except</em> 2h Block 1 due to insufficient DNA.

The following samples were used in a 1:10 dilution (2uL DNA : 18uL NanoPure H2O), due to their relatively high concentrations, to ensure accurate pipetting:

  • 72h Block 4
  • D14 Block 1
  • D14 Block 2
  • D14 Block 3
  • D14 Block 4
  • D14 Block 5
  • D14 Block 6
  • D14 Block 8
  • D14 Block 10

All samples were diluted to a final concentration of 9.645ng/uL (154.24ng total; 17.6uL) in NanoPure water, which is equal to 77.12ng of DNA per assay replicate. These numbers were chosen based off of the sample with the lowest concentration.

The following samples were used in their entirety:

  • 2h Block 8
  • D35 Block 8

Calculations were added to the spreadsheet provided by Javier (Google Sheet): A.cervicornis_DNA_Extractions(May_2017).xlsx

The spreadsheet became overly complicated because I initially forgot to account for the need to run each sample in duplicate.

The kit reagent dilutions were as follows:

  • Diluted ME1: 52mL of ME1 + 464mL of <em>distilled</em> water
  • Diluted ME4: 10uL of ME4 + 10uL of TE Buffer (pH=8.0; made by me on 20130408).
  • Standard curve: Prepped per instruction manual, with double volumes for two plates.
  • Diluted ME5: 50uL/well x 152well = 7600uL; 7600uL/1000 = 7.6uL; 7.6uL ME5 + 7592.4uL Diluted ME1
  • Diluted ME6: 50uL/well x 152well = 7600uL; 7600uL/2000 = 3.8uL; 3.8uL ME6 + 7596.2uL Diluted ME1
  • Diluted ME7: 50uL/well x 152well = 7600uL; 7600uL/5000 = 1.52uL; 1.52uL ME7 + 7598.48uL Diluted ME1

All diluted solutions were stored on ice for duration of procedure.

The remaining Diluted ME1 solution was stored at 4C (FTR 209), and is stable for 6 months, per the manufacturer’s instructions.

See the Results section below for plate layouts.

Plates were read at 450nm on the Seeb Lab Victor 1420 Plate Reader (Perkin Elmer) and the amount of DNA methylation was determined.

Results:

Individual sample methylation quantification (Google Sheet): A.cervicornis_DNA_Extractions(May_2017).xlsx

Plate Reader Output File Plate #1 (Google Sheet): 20170511_coral_DNA_methylation_plate01.xls

Plate Reader Output File Plate #2 (Google Sheet): 20170511_coral_DNA_methylation_plate02.xls

 

I’m not familiar with the experimental design, so I’m not going to spend time handling any of the in-depth analysis at this point in time. However, here’s the background on how methylation quantification and percent methylation were determined.

  1. Mean absorbance (450nm) was determined for all samples and standard curve samples. It’s important to note that the standard deviation between replicates was not evaluated and there appears to be consistent variability between samples, but I’m not certain how much variation is “acceptable” with and assay of this nature.

  2. The mean absorbance of the standard curve samples were plotted against their corresponding DNA amounts and a linear trendline was fitted to the points.

  3. Per the manufacturer’s recommendations, the four points (including the zero point) that yielded the best linear fit (i.e. best R^2 value) were used and the slope of best fit line for those four points was determined.

  4. This slope was then utilized in the equation provided by the manufacturer (see pg. 8 of the MethylFlash Kit manual).

Share

DNA Methylation Quantification – Coral DNA from Jose M. Eirin-Lopez (Florida International University)

Ran the coral DNA I quantified on 20160630 through the MethylFlash Methylated DNA Quantification Kit [Colorimetric] (Epigentek) kit to quantify global methylation.

Used 100ng of DNA per 8uL per replicate (x2 replicates = total 200ng in 16uL). Calcs are here (Google Sheet): 20160705_coral_DNA_methylation_calcs

Manufacturer’s protocol was followed.

Dilutions of kit reagents:

ME5 (1:1000) 2.6uL ME5 + 2597.4uL diluted ME1

ME6 (1:2000) 1.3uL ME6 + 2598.7uL diluted ME1

ME7 (1:5000) 0.52uL ME7 + 2599.48uL diluted ME1

Samples were quantified on the Seeb’s plate reader @ 450nm  (Wallac 1420 Victor 2  [Perkin Elmer])

Results:

Google Sheet: 20160707_coral_DNA_methylflash

sample treatment 5-mC(ng)
H1_1 nitrogen 0.8712248853
H1_10 nitrogen 0.6917168368
H1_12 control 0.2738478893
H1_5 nitrogen & phosphorous 0.9663585942
H1_6 control 0.6494783825
H1_8 nitrogen & phosphorous 0.4244913398
H24_1 nitrogen 0.372603297
H24_10 nitrogen 0.4237237786
H24_12 control 0.5350511937
H24_5 nitrogen & phosphorous 0.1495527697
H24_6 control 0.2291900804
H24_8 nitrogen & phosphorous 0.2213437801
H5_1 nitrogen -0.1233169902
H5_10 nitrogen 0.6997668774
H5_12 control 0.2307000493
H5_5 nitrogen & phosphorous -0.07790933048
H5_6 control 0.4562401662
H5_8 nitrogen & phosphorous 0.5949647121

 

Overall, it’s difficult to really interpret these results. I believe the data is a time course (e.g. H5 = hour 5, H24 = hour 24). Additionally, looking at treatments, there appear to be replicates, but it’s not clear what type of replicates they are (i.e. technical or biological). Generally, it seems like the control samples have lower quantities of methylated DNA than the treated samples. However, this doesn’t hold true for all three of the groups.

And, not that it really matters, but I don’t even know what species this is…

In any case, this was an attempt to gather some preliminary data for a grant that Steven is attempting to put together, so the original experiment and the subsequent data aren’t as robust as one would expect for a full-blown research project.

Share

Epigenetic variation of two populations grown at common site

In a different experiment compared to when Fidalgo siblings were outplanted at two sites, we also examined Hood Canal (HC) and Oyster Bay (SS/South Sound) grown at Clam Bay (Manchester). Descriptor.

These were the oysters Katherine Silliman spawned in the summer of 2015 and represent seed Jake outplanted years ago.

This was run against the BGI scaffolds >10k.
BSMAP-06-BGIv002_1CC19CB1.png

The results are quite interesting.
RStudio_1CC19CEC.png

The full notebook can be found at https://github.com/sr320/nb-2016/blob/master/O_lurida/BSMAP-06-BGIv001.ipynb.

Share

Fidalgo offspring at two locations

We carried out whole genome BS-Seq on siblings outplanted out at two sites: Fidalgo Bay (home) and Oyster Bay. Four individuals from each locale were examined.

A running description of the data is available @ https://github.com/RobertsLab/project-olympia.oyster-genomic/wiki/Whole-genome-BSseq-December-2015.

I need to look back to a genome to analyze this. We did some PacBio sequencing a while ago.
– http://nbviewer.jupyter.org/github/sr320/ipython_nb/blob/master/OlyO_PacBio.ipynb

In recap, the fastq file had 47,475 reads: http://owl.fish.washington.edu/halfshell/OlyO_Pat_PacBio_1.fastq

3058 of these reads were >10k bp: http://eagle.fish.washington.edu/cnidarian/OlyO_Pat_PacBio_10k.fa

Those 3058 reads were nicely assembled into 553 contigs: http://eagle.fish.washington.edu/cnidarian/OlyO_Pat_PacBio_10k_contigs.fa


Step forward a bit and all 47475 reads were assembled into the 5362 contigs known as OlyO_Pat_v02.fa http://owl.fish.washington.edu/halfshell/OlyO_Pat_v02.fa

The latter (v02) was used to map the 8 libraries. Roughly getting about 8% mapping
BSMAP-03b-Genomev2-10x_1CB41B65.png

About 15 fold average coverage
BSMAP-03b-Genomev2-10x_1CB41B7A.png

And with a little filtering
BSMAP-03b-Genomev2-10x_1CB41B9E.png

Note that awk script filtered for 10x coverage! this could be altered.

and R have an intriguing relationship
BSMAP-03b-Genomev2-10x_1CB41BC9.png

With BGI Draft Genome

Following the same workflow with the BGIv1 scaffolds >10k bp have about 16% or reads map.
BSMAP-05-BGIv001_1CB41C8D.png

3 fold coverage
BSMAP-05-BGIv001_1CB41CB3.png

again, making sure there is 10x coverage at a given CG loci
we get
RStudio_1CB41F50.png

Much weaker if we allow only 3x coverage at a given CG loci
RStudio_1CB421EC.png

and the bit of R code

setwd("/Volumes/web-1/halfshell/working-directory/16-04-05")

library(methylKit)

file.list ‘mkfmt_2_CGATGT.txt’,
‘mkfmt_3_TTAGGC.txt’,
‘mkfmt_4_TGACCA.txt’,
‘mkfmt_5_ACAGTG.txt’,
‘mkfmt_6_GCCAAT.txt’,
‘mkfmt_7_CAGATC.txt’,
‘mkfmt_8_ACTTGA.txt’
)

myobj=read(file.list,sample.id=list(“1″,”2″,”3″,”4″,”5″,”6″,”7″,”8″),assembly=”Pat10k”,treatment=c(0,0,0,0,1,1,1,1))

meth<-unite(myobj)
head(meth)
nrow(meth)
getCorrelation(meth,plot=F)
hc PCA<-PCASamples(meth)

Share

Since you’ve been gone

Soon after Ensenada I went to Chili, SICB, and PAG (in that order). The new year is often of time to let go of lingering projects, and likely I will be doing that soon. But to bring a few pending efforts to the forefront, so that I can analyze etc here is a bit of data that is (or soon will) be coming in.
Much of this is centered around the Ostrea lurida.


The first batch was 2bRAD data.
sheet

The full list of samples are here.

sc

These raw data are here.

A quick fastqc….


We also now have a fresh set of MBD-BS.. now out for sequencing.
Pregame here

pic


And just some plain old BS
pic

Details

Share