Tag Archives: bedtools

Re-Reproducing differential methylation analyses, again

Having just given a talk on reproducibility, I am in the midst of responding to reviewer comments about what we did (12 months ago!) and boy can I say every minute of putting this notebook together was worth it. I even found where we ran the entire notebook, so all result files are easily accessible. Beyond praising Claire, I will document my follow up analysis here.

Essentially the want more quantitative information on differential methylation beyond ..

OlsonandRoberts2015_9_docx_1BC2FBAE.png

Makes sense.

Here is what was originally done.

olson-ms-nb_BiGo_dev_ipynb_at_master_·_che625_olson-ms-nb_1BC2FCC5.png

For example the file named linexon contained 16 exon_intersect_DML_lin_u.txt. The 4 files were concatenated to produce lintable ….

eagle_fish_washington_edu_cnidarian_olson-ms-nb-master_12_1714_wd_lintable_1BC2FE30.png

and a little awk

awk 'FNR==NR{sum+=$1;next}; {print $0,sum}' lintable{,} > lin_total
awk '{print $2, $1, $3, (($1/$3)*100)}' lin_total > lineage_DMLs

to create lineage_DMLs

eagle_fish_washington_edu_cnidarian_olson-ms-nb-master_12_1714_wd_lineage_DMLs_1BC2FE65.png


Analogously here are the developmental_DMLs….

eagle_fish_washington_edu_cnidarian_olson-ms-nb-master_12_1714_wd_developmental_DMLs_1BC3F775.png


And we certainly need to now how many all_CGs we have…

eagle_fish_washington_edu_cnidarian_olson-ms-nb-master_12_1714_wd_all_CGs_1BC3F807.png


Table

Feature Family specific DMLs Developmental specific DMLs
Transposable Element 17 16
Promoter Region 2 3
Exon 16 12
Intron 25 46

I know we did this before, but I believe the reviewers want a break-down, or list of which specific transposable elements. This is a long shot if I can find this…
2 minutes later https://github.com/sr320/ipython_nb/blob/master/BiGo_larvae_manuscript4.ipynb.


To be sure files are accurate, I will intersectbed again. Based on recollection there is likely not a difference in proportion based on all TEs. This brings up a an important point of how to record “negative” data that does not go into a paper.

Share

Side track with tracks

Today working on our paper looking at heat stress and DNA methylation I dived deeper into the array data in the search for what should be called a DMR.

As a refresher we have tracks from the core that have 1.8+ fold difference (sig) and complementary tracks where there are three adjacents (3plusAdjacent). I made tracks where I merged the latter into a single feature when within 100bp of each other.

In order to see if there is any consistency across oysters..

#concatenated tracks
!cat 
/Volumes/web/halfshell/2015-05-comgenbro/2M_3plusmerge_Hypo.bed 
/Volumes/web/halfshell/2015-05-comgenbro/4M_3plusmerge_Hypo.bed 
/Volumes/web/halfshell/2015-05-comgenbro/6M_3plusmerge_Hypo.bed 
> /Users/sr320/git-repos/paper-Temp-stress/ipynb/analyses/mergHYPOcat.bed

#then using bedtools merge features (though first had to sort)
!bedtools sort -i /Users/sr320/git-repos/paper-Temp-stress/ipynb/analyses/mergHYPOcat.bed 
> /Users/sr320/git-repos/paper-Temp-stress/ipynb/analyses/mergHYPOcatsort.bed
!bedtools merge -c 2 -o count 
-i /Users/sr320/git-repos/paper-Temp-stress/ipynb/analyses/mergHYPOcatsort.bed | sort -nrk4

and so on for the hypermethylated region.

end of the AM, left with a new track

scaffold481 576986  578532  -3
scaffold247 141885  142442  -3
scaffold1518    212680  213736  -3
scaffold853 46186   46496   -2
scaffold406 419330  419384  -2
scaffold406 419005  419060  -2
scaffold406 418360  418767  -2
scaffold394 555813  556224  -2
scaffold247 144031  144583  -2
scaffold242 75918   76344   -2
scaffold142 656144  656735  -2
scaffold12  243960  244376  -2
scaffold257 1235165 1235481 +2

Jupyter Notebook


Could also do this on a less conservative approach by acting on (sig) tracks in bedtools

Share

Wayback to just-MBD

Prior to bisulfite sequencing we did do a couple of MBD enrichment libraries to describe DNA methylation in oysters. Results even were snuck into this perspective.

mbd

While I am sure there are genome tracks around, I am ending up #doingitagain.

In short I took the raw Solid reads, align to Crassostrea_gigas.GCA_000297895.1.26.dna.genome in CLC, exported bam, converted to bedgraph, converted to tdf.


In long:
The raw files
raw

1) Imported into CLC v8.0.1

          Discard read names = Yes
          Discard quality scores = No
          Original resource = /Users/sr320/data-genomic/tentacle/solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_SB_METH/solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SB_MOTH.qual
          Original resource = /Users/sr320/data-genomic/tentacle/solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_SB_METH/solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_SB_MOTH.csfasta

(yes the core called them MOTH)

2) Reads were mapped

mapped

3) Exported as BAM.

4) Converted to bedgraph

!/Applications/bioinfo/bedtools2/bin/genomeCoverageBed 
-bg 
-ibam /Users/sr320/data-genomic/tentacle/solid0078_moth.bam 
-g /Volumes/web/halfshell/qdod3/Cg.GCA_000297895.1.25.dna_sm.toplevel.genome 
> /Users/sr320/data-genomic/tentacle/MBD-meth.bedgraph          

5) Converted to toTDF

tdf


Rinse and repeat with unmethylated fraction (UNMOTH) and import tdf into IGV!

Share