Tag Archives: bedtools

Re-Reproducing differential methylation analyses, again

Having just given a talk on reproducibility, I am in the midst of responding to reviewer comments about what we did (12 months ago!) and boy can I say every minute of putting this notebook together was worth it. I even found where we ran the entire notebook, so all result files are easily accessible. Beyond praising Claire, I will document my follow up analysis here.

Essentially the want more quantitative information on differential methylation beyond ..


Makes sense.

Here is what was originally done.


For example the file named linexon contained 16 exon_intersect_DML_lin_u.txt. The 4 files were concatenated to produce lintable ….


and a little awk

awk 'FNR==NR{sum+=$1;next}; {print $0,sum}' lintable{,} > lin_total
awk '{print $2, $1, $3, (($1/$3)*100)}' lin_total > lineage_DMLs

to create lineage_DMLs


Analogously here are the developmental_DMLs….


And we certainly need to now how many all_CGs we have…



Feature Family specific DMLs Developmental specific DMLs
Transposable Element 17 16
Promoter Region 2 3
Exon 16 12
Intron 25 46

I know we did this before, but I believe the reviewers want a break-down, or list of which specific transposable elements. This is a long shot if I can find this…
2 minutes later https://github.com/sr320/ipython_nb/blob/master/BiGo_larvae_manuscript4.ipynb.

To be sure files are accurate, I will intersectbed again. Based on recollection there is likely not a difference in proportion based on all TEs. This brings up a an important point of how to record “negative” data that does not go into a paper.


Side track with tracks

Today working on our paper looking at heat stress and DNA methylation I dived deeper into the array data in the search for what should be called a DMR.

As a refresher we have tracks from the core that have 1.8+ fold difference (sig) and complementary tracks where there are three adjacents (3plusAdjacent). I made tracks where I merged the latter into a single feature when within 100bp of each other.

In order to see if there is any consistency across oysters..

#concatenated tracks
> /Users/sr320/git-repos/paper-Temp-stress/ipynb/analyses/mergHYPOcat.bed

#then using bedtools merge features (though first had to sort)
!bedtools sort -i /Users/sr320/git-repos/paper-Temp-stress/ipynb/analyses/mergHYPOcat.bed 
> /Users/sr320/git-repos/paper-Temp-stress/ipynb/analyses/mergHYPOcatsort.bed
!bedtools merge -c 2 -o count 
-i /Users/sr320/git-repos/paper-Temp-stress/ipynb/analyses/mergHYPOcatsort.bed | sort -nrk4

and so on for the hypermethylated region.

end of the AM, left with a new track

scaffold481 576986  578532  -3
scaffold247 141885  142442  -3
scaffold1518    212680  213736  -3
scaffold853 46186   46496   -2
scaffold406 419330  419384  -2
scaffold406 419005  419060  -2
scaffold406 418360  418767  -2
scaffold394 555813  556224  -2
scaffold247 144031  144583  -2
scaffold242 75918   76344   -2
scaffold142 656144  656735  -2
scaffold12  243960  244376  -2
scaffold257 1235165 1235481 +2

Jupyter Notebook

Could also do this on a less conservative approach by acting on (sig) tracks in bedtools


Wayback to just-MBD

Prior to bisulfite sequencing we did do a couple of MBD enrichment libraries to describe DNA methylation in oysters. Results even were snuck into this perspective.


While I am sure there are genome tracks around, I am ending up #doingitagain.

In short I took the raw Solid reads, align to Crassostrea_gigas.GCA_000297895.1.26.dna.genome in CLC, exported bam, converted to bedgraph, converted to tdf.

In long:
The raw files

1) Imported into CLC v8.0.1

          Discard read names = Yes
          Discard quality scores = No
          Original resource = /Users/sr320/data-genomic/tentacle/solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_SB_METH/solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SB_MOTH.qual
          Original resource = /Users/sr320/data-genomic/tentacle/solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_SB_METH/solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_SB_MOTH.csfasta

(yes the core called them MOTH)

2) Reads were mapped


3) Exported as BAM.

4) Converted to bedgraph

-ibam /Users/sr320/data-genomic/tentacle/solid0078_moth.bam 
-g /Volumes/web/halfshell/qdod3/Cg.GCA_000297895.1.25.dna_sm.toplevel.genome 
> /Users/sr320/data-genomic/tentacle/MBD-meth.bedgraph          

5) Converted to toTDF


Rinse and repeat with unmethylated fraction (UNMOTH) and import tdf into IGV!