EW2: Validating and Reviewing RDF for Open PHACTS

Hypothesis: The proprietary RDF is valid and uses common ontologies

Start date:  2014-07-23 End date: 2014-07-23

Description:

For obvious reasons, this experiment will not disclose all details. It will outline, however, the steps I undertook to do the validation and evaluation.

Methods:

• determine format
• validate basic syntax
• inspect triple structure
• inspect all used ontologies
• match results against Open PHACTS habits

Report:

The provided RDF document is in the RDF/XML format. It validates as well-formed XML, with xmllint (Debian:libxml2-utils):

xmllint --noout file.rdf

The document is not linked to a DTD or XML Schema (as is common with RDF/XML). Parsing the library with rapper (Debian:raptor2-utils) does not find problems either:

cat file.rdf | rapper -t -q - . > /dev/null

Using the –count option, 72 triples are found in the sample RDF. The RDF was converted into Turtle with:

rapper -o turtle file.rdf > file.ttl

This resulted in a file with 101 lines.

Manual inspection of the Turtle file shows that it has nine resources of five different types. Resources are not formally types using rdf:type, but the type is clear from the resource IRI. Most properties are provided as literals, including identifiers. The latter could use identifiers.org style identifiers, or RDF IRIs provided by upstream databases. The structure looks reasonable, with one type at the center, pointing to the four other types with four different predicates.

The document uses mostly a custom, undocumented ontology, where term IRIs have human readable forms. Common ontologies used include Dublin Core, and BIBO. Ontologies are looked up at the BioPortal project page (http://bioportal.bioontology.org/projects/Open_PHACTS); the BioAssay Ontology, ChEBI, and QUDT ontologies, as found there, are not used.

The expected VoID descriptions with provenance information is missing (see the Dataset Descriptions for the Open Pharmacological Space specification).

Conclusion:

The RDF is in good shape, but can be improved. It is valid and is human readable. It should, however, make more use of ontologies already in practice. Importantly, the data should be complemented with VoID descriptions.

EW1: Updating Bioclipse with OPSIN 1.6.0

Hypothesis: Bioclipse works just as well with OPSIN 1.6.0 as it does with 1.5.0.

Start date: 2014-07-20 End date: 2014-07-21

Description: Bioclipse in the development branch has OPSIN 1.5.0 exposed with the opsin manager. The intention of this experiment is to update Bioclipse with OPSIN 1.6.0, keeping the opsin manager methods working.

Methods

• update the OPSIN version
• test with the test suite
• publish the patches

Report

I still had a working development environment around. As I installed Eclipse 4.4 a few days earlier, I opened the Eclipse workspace with this version, which triggered an irreversible upgrade of the workspace so that I cannot return to Eclipse 4.3. The test suite was run as “JUnit Plug-in Test” using Eclipse, defined by the AllOpsinManagerPluginTests class. This shows two fails in the APITest test class, related to @TestClass and @TestMethod annotation. Annotation was added and committed as a patch to ensure no fails were reported.

Then the opsin-1.6.0-excludingInChI-jar-with-dependencies.jar was downloaded from the OPSIN download page. This version was selected because the 1.5.0 version excluded the InChI bits too and these is already available from other Bioclipse plugins. The new jar was copied into the net.bioclipse.opsin’s jar/ folder and .classpatch, MANIFEST.MF, and build.properties were updated accordingly.

The result was successfully testing using the aforementioned AllOpsinManagerPluginTests class and by running Bioclipse itself and using the opsin manager from the JavaScript console with the command ui.open(opsin.parseIUPACName(“benzene”)).

The two patches were made available as pull request 46.

Conclusion:

No special updated were needed and Bioclipse works with OPSIN 1.6.0 just as it did with OPSIN 1.5.0.