Category Archives: Open PHACTS

EW2: Validating and Reviewing RDF for Open PHACTS

Hypothesis: The proprietary RDF is valid and uses common ontologies

Start date:  2014-07-23 End date: 2014-07-23

Description:

For obvious reasons, this experiment will not disclose all details. It will outline, however, the steps I undertook to do the validation and evaluation.

Methods:

  • determine format
  • validate basic syntax
  • test reading and conversion
  • inspect triple structure
  • inspect all used ontologies
  • match results against Open PHACTS habits

Report:

The provided RDF document is in the RDF/XML format. It validates as well-formed XML, with xmllint (Debian:libxml2-utils):

xmllint --noout file.rdf

The document is not linked to a DTD or XML Schema (as is common with RDF/XML). Parsing the library with rapper (Debian:raptor2-utils) does not find problems either:

cat file.rdf | rapper -t -q - . > /dev/null

Using the –count option, 72 triples are found in the sample RDF. The RDF was converted into Turtle with:

rapper -o turtle file.rdf > file.ttl

This resulted in a file with 101 lines.

Manual inspection of the Turtle file shows that it has nine resources of five different types. Resources are not formally types using rdf:type, but the type is clear from the resource IRI. Most properties are provided as literals, including identifiers. The latter could use identifiers.org style identifiers, or RDF IRIs provided by upstream databases. The structure looks reasonable, with one type at the center, pointing to the four other types with four different predicates.

The document uses mostly a custom, undocumented ontology, where term IRIs have human readable forms. Common ontologies used include Dublin Core, and BIBO. Ontologies are looked up at the BioPortal project page (http://bioportal.bioontology.org/projects/Open_PHACTS); the BioAssay Ontology, ChEBI, and QUDT ontologies, as found there, are not used.

The expected VoID descriptions with provenance information is missing (see the Dataset Descriptions for the Open Pharmacological Space specification).

Conclusion:

The RDF is in good shape, but can be improved. It is valid and is human readable. It should, however, make more use of ontologies already in practice. Importantly, the data should be complemented with VoID descriptions.