In our original effort (bioRxiv doi: http://dx.doi.org/10.1101/012831) we characterized DMLs in relationship to genomic regions, including putative transposable elements. In this case TEs were defined as tandem repeats and targets that had sequence similarity to protein sequences in other species.
Transposable elements were identified using RepeatMasker, a program that screens and annotates interspersed repeats (Smit et al., 1996-2010). Specifically, RepeatProteinMask, was used with repbase which contained 7,445 peptide sequences. RepeatProteinMask also uses Tandem Repeat Finder (Benson, 1999) to identify tandem repeats which were included in the genome feature track. A total of 119,786 features are in the transposable element genome feature file used for analysis in this paper including 61,319 tandem repeat regions and 58,467 transposable elements identified based on sequence similarity.
Analysis was rerun with targets from RepeatProteinMask (via WUBLASTX) and tandem repeats examined separately. Based on this TEs defined as those identified via RepeatProteinMask were more like to possess lineage specific DMLs.
The GitHub repo documenting this effort has been changed to redefine the TE track as those regions identified from RepeatProteinMask. This makes more sense as it is difficult to determine whether a repeat is associated with an actual transposable element.