# DNA Methylation Analysis – Olympia Oyster Whole Genome BSseq Bismark Pipeline MethylKit Comparison

I previously ran two variations on the Bismark analysis for our Olympia oyster whole genome bisulfite sequencing data:

I followed this up using the MethylKit R package to identify differentially modified loci (DML), based on differing amounts of coverage (1x, 3x, 5x, & 10x) and percent methylation differences between the two groups of oysters (25%, 50%, & 75%).

Both sets of analyses were documented in R Projects:

##### RESULTS

BedGraphs (1x coverage, 25% diff in methylation):

###### “Relaxed” Bismark settings:

The BedGraph outputs from the least stringent coverage/percent difference in methylation for both Bismark pipelines yield suprisingly low numbers of DML.

They yield 22 and 21 DML, respectively. Of course, more stringent BedGraphs have fewer DML, but may be more believable due to having a more robust set of data.

Interestingly, the two analyses reveal that a single contig contains the majority of DML, all within a 1000bp range.

Will continue to examine this data by examining Bismark BedGraphs in IGV, and running some additional MethylKit analysis looking at differentially modified regions(DMRs) to see what we can gleen from this.

# Re-Reproducing differential methylation analyses, again

Having just given a talk on reproducibility, I am in the midst of responding to reviewer comments about what we did (12 months ago!) and boy can I say every minute of putting this notebook together was worth it. I even found where we ran the entire notebook, so all result files are easily accessible. Beyond praising Claire, I will document my follow up analysis here.

Essentially the want more quantitative information on differential methylation beyond ..

Makes sense.

Here is what was originally done.

For example the file named linexon contained 16 exon_intersect_DML_lin_u.txt. The 4 files were concatenated to produce lintable ….

and a little awk

awk 'FNR==NR{sum+=$1;next}; {print$0,sum}' lintable{,} > lin_total
awk '{print $2,$1, $3, (($1/\$3)*100)}' lin_total > lineage_DMLs

to create lineage_DMLs

Analogously here are the developmental_DMLs….

And we certainly need to now how many all_CGs we have…

Table

Feature Family specific DMLs Developmental specific DMLs
Transposable Element 17 16
Promoter Region 2 3
Exon 16 12
Intron 25 46

I know we did this before, but I believe the reviewers want a break-down, or list of which specific transposable elements. This is a long shot if I can find this…
2 minutes later https://github.com/sr320/ipython_nb/blob/master/BiGo_larvae_manuscript4.ipynb.

To be sure files are accurate, I will intersectbed again. Based on recollection there is likely not a difference in proportion based on all TEs. This brings up a an important point of how to record “negative” data that does not go into a paper.