# EW7: converting metabolite Labels into DataNodes in WikiPathways GPML

Hypothesis: The GPML format has sufficient information to convert a metabolite encoded as a Label into a DataNode with identifier

Start date: 2014-09-04 End date: 2014-09-06

Description:

The GPML format is used by WikiPathways to internally store pathways. The format is human-readable, allowing for adding missing information. Particularly, it can be used to convert a list of metabolites as <Label> elements as <DataNode> elements. Lists of potential <Label> elements to be converted is outlined in other experiments, such as EW6.

For example:

 <Label TextLabel="Acetyl-CoA" GraphId="c7c">
  <Graphics CenterX="150.0" CenterY="640.0" Width="90.33333333333333" Height="19.0" ZOrder="28672" FillColor="ffffff" FontWeight="Bold" FontSize="12" Valign="Middle" />
 </Label>

This can be converted into:

 <DataNode TextLabel="Acetyl-CoA" GraphId="c7c" Type="Metabolite">
  <Graphics CenterX="150.0" CenterY="640.0" Width="90.33333333333333" Height="19.0" ZOrder="28672" FillColor="ffffff" FontWeight="Bold" FontSize="12" Valign="Middle" />
  <Xref Database="ChEBI" ID="CHEBI:15351" />
 </DataNode>

Methods

• Open a WikiPathways page in the MediaWiki edit mode
• Remove one or more <Label> elements to convert
• Convert the start and end tag from Label to DataNode
• Add the Type=”Metabolite” attribute (with value)
• Add a <Xref> child element, preferable with identifier for that metabolite
• Place the new <DataNode> elements just above the first <Interaction> element

Report

Many pathways have been update using this approach in the past, but I had not previously written up the method I used. In the past few days, these are example pathways updated this way:

When there are many <Labels> to be converted, I commonly use a plain text editor and “replace” functionality.

It should be noted that graph identifiers do not get changed, so that links between elements in the GPML are preserved.

Conclusion: This method requires experience with manually editing XML files; the risk is that you break the GPML file, though the WikiPathways interface does validate the file before saving against the GPML XML Schema.

# EW6: Finding nodes in Rattus norvegicus pathways with IUPAC names

Hypothesis: Rattus norvegicus pathways in WikiPathways have DataNode’s with labels containing IUPAC names which can be tagged as type Metabolite.

Start date: 2014-09-05 End date: 2014-09-05

Description:

WikiPathways entries in GPML have DataNode objects and Label objects. It was found before [here, here] that metabolites can be encoded in pathways is Label objects and therefore not machine-readable as Metabolite-type DataNode and unable to have a database identifier. As such, these metabolites are unusable for pathway analysis of metabolomics data.

By processing these GPML files (they are XML-based) and iterating over all Label’s we can attempt to convert this label into chemical structure with OPSIN. This goes under the assumption that if OPSIN can parse the label into a structure, it is one. This label will be recorded along with the pathway identifier for manual inspection. For each structure it will also look up a ChemSpider identifier.

Methods

Unchanged protocol.

• Download the GPML files from WikiPathways
• Get a working Bioclipse development version (hard) with the OPSIN, InChI, and ChemSpider extensions
• A Groovy script to iterate over the GPML, find <Label> elementsEach <Label> is parsed with OPSIN and if successful, generate an InChI
• Use the InChIs to find ChemSpider identifiers
• Output all as a text file and open metabolites in a Structure table

Report

Similar to the experiment for Anopheles gambiae and Homo sapiens only curated pathways were analyzed, 143 in total, downloaded from WikiPathways.org on August 24. The Groovy script is used detailed in this experiment.

The script found 47 Labels that are possibly metabolites in 8 different rat pathways. The full list was uploaded to Gist.

Conclusion: Rat pathways also include metabolites encoded in GPML <Label> elements.

# EW5: Finding nodes in Homo sapiens pathways with IUPAC names

Hypothesis: Homo sapiens pathways in WikiPathways have DataNode’s with labels containing IUPAC names which can be tagged as type Metabolite.

Start date: 2014-09-01 End date: 2014-09-01

Description: WikiPathways entries in GPML have DataNode objects and Label objects. It was found before [here] that metabolites can be encoded in pathways is Label objects and therefore not machine-readable as Metabolite-type DataNode and unable to have a database identifier. As such, these metabolites are unusable for pathway analysis of metabolomics data.

By processing these GPML files (they are XML-based) and iterating over all Label’s we can attempt to convert this label into chemical structure with OPSIN. This goes under the assumption that if OPSIN can parse the label into a structure, it is one. This label will be recorded along with the pathway identifier for manual inspection. For each structure it will also look up a ChemSpider identifier.

Methods

• Download the GPML files from WikiPathways
• Get a working Bioclipse development version (hard) with the OPSIN, InChI, and ChemSpider extensions
• A Groovy script to iterate over the GPML, find <Label> elements
• Each <Label> is parsed with OPSIN and if successful, generate an InChI
• Use the InChIs to find ChemSpider identifiers
• Output all as a text file and open metabolites in a Structure table

Report

Similar to the experiment for Anopheles gambiae only curated pathways were analyzed, some 266 in total, downloaded from WikiPathways.org on August 24. The previous Groovy script was updated to point to the human pathways, but also to output the results in a file, rather than STDOUT. The new script was uploaded to myExperiment.org.

The script found 42 Labels that are possibly metabolites. The full list was uploaded to Gist. Again, labels were found which could not be linked to a single ChemSpider ID. For example, “5b-Pregnane-3,20-dione” which will results in these ChemSpider search hits: 21427590, 389575, 21232692, 21239075, 21237402. The result file also shows a few labels with new lines.

One metabolite was manually confirmed in WP1449Imidazoquinolin. Interestingly, the Label was visually “connected” with “(anti-viral compounds)” which have a ChEBI identifier and could be converted to a DataNode of type Metabolite too:

Most work, however, needs to be done in the Tryptophan metabolism pathway (WP465); many metabolites are not properly made machine readable.

Conclusion:

Human pathways also include metabolites encoded in GPML <Label> elements, even in the curated subset.