Interoperable Molecular Structure Search Web Server

User Manual

This manual describes the functionality of the provided user interface, and the syntax and semantics of the molecular and substructural queries that the users can submit to the SPARQL endpoint.

Before using the interface, we recommend the users to verify that their setup matches following requirements:

KNOWN ISSUE — if the user switches tabs during query evaluation, sometimes the result view of the corresponding tab is not updated when the query finishes. The result data is not lost though: manually selecting a result view mode (e.g. by clicking on "Table") fixes the problem. This issue has already been reported to upstream.

User interface

The user interface is separated into three parts: The SPARQL query editor, result browser, and the list of example queries. We briefly describe the functionality of each.

Example queries

Example queries are provided on the right margin of the interface. The users can exploit them as a base for constructing more advanced queries.

Query editor

The text area can be used to specify various SPARQL queries, target SPARQL endpoint (any publicly available endpoint, not only the IOCB one), and to execute the query. The queries are specified in standard SPARQL syntax. The interface provides basic syntax highlighting and autocompletion of prefixes and keywords. Several queries can be edited and executed at once, using the integrated tabs functionality. Tabs and their contents are persistent even if the user closes the frontend page in browser.

An example query can be run as such:

Query result visualizer

The visualization of the SPARQL query results is displayed after the query is successfully executed. Several switchable views of the result data are provided:

A good example of visualiser usage can be shown on the results of the example query UniProt Interoperability: After running the query and receiving the result, switch the visualiser to Pivot Table view and drag the ORGANISM variable to field marked as Rows, to obtain following statistic about the occurrences of activity measurements on proteins of each species:

Sachem endpoints

Procedure calls in SPARQL

To avoid complicating the SPARQL syntax, procedure calls are expressed as triple patterns using special predicate IRIs that identify the procedure. The advantage of this approach is its transparecy for the user: Procedure calls, their arguments and corresponding results can be queried in the same way as the data triplets that are stored persistently in a database.

In our setting, the object of the triple pattern always represents arguments, and the subject of the triple pattern always represents results of the procedure call: result procedure argument. The object (argument) in such triplet must be a blank node expressed in abbreviated form (i.e. by using []). Properties of this blank node are then understood as procedure arguments, objects of these properties must be constant values or bound variables.

Procedure arguments and results

During the evaluation, the argument values are passed to the corresponding procedure. IRIs of the arguments are specified in the procedure call definition. Using such mapping brings several advantages: Easy specification of default argument values, connection of argument values to their names (specified as IRIs), ability ty process any ordering of the arguments, and better description of the arguments by the database ontology.

Results of a procedure call are represented as the subject of the triple pattern. If the procedure returns a simple data type (e.g. a list of URIs), the variable used as the subject is bound to the results of the procedure. To allow returning structured results with more individual values (e.g. compounds that match given similarity search query along with their similarity scores), we have defined a special way to represent multi-value results of procedure calls similar to the way of passing multiple procedure arguments: The subject of the procedure call is also specified as a blank node in abbreviated form, individual properties of the blank node represent corresponding fields in the structured result value. Objects of these properties are either variables that are bound to the result values during the procedure evaluation, or constants that serve as filters on the result values.

Supported datasets

The server provides a distinct SPARQL endpoint for each of the supported datasets. Currently supported datasets include:

We write the IRI prefix idsm: as a shortcut for https://idsm.elixir-czech.cz/sparql/endpoint/ for convenience. And similarly, we write the IRI prefix sachem: as a shortcut for https://bioinfo.uochb.cas.cz/sparql-endpoint/sachem/. For example, dataset specified by idsm:chebi is available at IRI https://idsm.elixir-czech.cz/sparql/endpoint/chebi.

Supported procedures call

The endpoints support queries that may call the chemical substructure and similarity search procedures on all supported datasets.

Similarity search

The similarity search procedure call is mapped to property sachem:similaritySearch. It accepts following arguments:

Results of the procedure are compound values, that have following properties:

There is also a simplified variant of the similarity search procedure, mapped to property sachem:similarCompoundSearch. It uses the same arguments as sachem:similaritySearch, but returns the identified compounds directly as single-value non-structured results.

Substructure search

The substructure search procedure is mapped to property sachem:substructureSearch. It uses arguments sachem:query and sachem:topn with the same meaning as in the previous case, together with following extra arguments:

Results — the compounds which contain the query as a substructure — are non-structured and returned directly as the subject of the procedure call triplet.