Hypothesis: Rattus norvegicus pathways in WikiPathways have DataNode’s with labels containing IUPAC names which can be tagged as type Metabolite.
Start date: 2014-09-05 End date: 2014-09-05
WikiPathways entries in GPML have DataNode objects and Label objects. It was found before [here, here] that metabolites can be encoded in pathways is Label objects and therefore not machine-readable as Metabolite-type DataNode and unable to have a database identifier. As such, these metabolites are unusable for pathway analysis of metabolomics data.
By processing these GPML files (they are XML-based) and iterating over all Label’s we can attempt to convert this label into chemical structure with OPSIN. This goes under the assumption that if OPSIN can parse the label into a structure, it is one. This label will be recorded along with the pathway identifier for manual inspection. For each structure it will also look up a ChemSpider identifier.
- Download the GPML files from WikiPathways
- Get a working Bioclipse development version (hard) with the OPSIN, InChI, and ChemSpider extensions
- A Groovy script to iterate over the GPML, find <Label> elementsEach <Label> is parsed with OPSIN and if successful, generate an InChI
- Use the InChIs to find ChemSpider identifiers
- Output all as a text file and open metabolites in a Structure table
Similar to the experiment for Anopheles gambiae and Homo sapiens only curated pathways were analyzed, 143 in total, downloaded from WikiPathways.org on August 24. The Groovy script is used detailed in this experiment.
The script found 47 Labels that are possibly metabolites in 8 different rat pathways. The full list was uploaded to Gist.
Conclusion: Rat pathways also include metabolites encoded in GPML <Label> elements.