SPARQL Endpoint Access

Querying IDSM SPARQL endpoint

IDSM supports querying through the SPARQL Protocol. This allows IDSM to be accessed in a standard way from different programming languages, just by using a library/package/command that implements this protocol. The following sections demonstrate how to access IDSM from several common languages.

Querying a SPARQL endpoint from Python

To access IDSM from Python, the SPARQLWrapper module can be used. SPARQLWrapper is a simple Python wrapper around a SPARQL service to remotely execute your queries. It helps by creating the query invocation and, optionally, converting the result into a more manageable format.

from SPARQLWrapper import SPARQLWrapper, JSON

endpoint = "https://idsm.elixir-czech.cz/sparql/endpoint/idsm"
query = """
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

    SELECT * WHERE {
        ?entity rdfs:label ?label
    }
    LIMIT 10
    """

sparql = SPARQLWrapper(endpoint)
sparql.setQuery(query)
sparql.setReturnFormat(JSON)

results = sparql.query().convert()

for result in results["results"]["bindings"]:
   print(result["entity"]["value"] + ", " + result["label"]["value"])

Querying a SPARQL endpoint from Java

To access IDSM from Java, Apache Jena can be used. Jena is a free and open source framework for building Semantic Web and Linked Data applications. This framework also includes a library for remote querying a SPARQL service.

import org.apache.jena.rdfconnection.RDFConnection;

public class SPARQLQueryExample {
    public static void main(String[] args) {
        String endpoint = "https://idsm.elixir-czech.cz/sparql/endpoint/idsm";
        String query = """
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

            SELECT * WHERE {
              ?entity rdfs:label ?label
            }
            LIMIT 10
            """;

        try(RDFConnection conn = RDFConnection.connect(endpoint)) {
            conn.querySelect(query, (qs) -> {
                System.out.println(qs.get("entity") + ", " + qs.get("label"));
            });
        }
    }
}

Querying a SPARQL endpoint from Bash

To access IDSM from Python, the curl utility can be used. Curl is a command-line tool for transferring data specified with URL syntax. This tool is part of the vast majority of Linux distributions and is typically installed by default.

endpoint="https://idsm.elixir-czech.cz/sparql/endpoint/idsm"
query="query=
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT * WHERE {
  ?entity rdfs:label ?label
}
LIMIT 10"

curl -H "Accept: text/csv" --data-urlencode "${query}" "${endpoint}"

Local Access

IDSM is not designed to run locally on a personal computer or small server. It consists of a number of components, and its overall installation involves many steps. A loaded dataset is often very large, and the system tries to update it atomically, which requires a large amount of resources. For example, updating data from PubChemRDF can take up to two days and consume 1TB of RAM. For these reasons, we have always thought of IDSM as a server service and have not considered the possibility of local use. On the other hand, we do not prevent local use in any way. All source codes are freely available, including installation notes.

IDSM consists of the following components:
  • chemweb - IDSM SPARQL engine
  • pgms - PostgreSQL extension for mass spectrometry
  • pgsparql - PostgreSQL extension for IDSM SPARQL engine support
  • sachem - PostgreSQL extension for chemical substructure and similarity search
  • loaders - IDSM dataset loaders
  • notes - IDSM installation notes
  • website - IDSM website (not needed for local use)

In addition, we have newly extended the installation notes with an example for Docker that allows a user to run a subset of IDSM locally. This subset includes data from the MoNA and ISDB databases. This example will allow you to run two docker containers using the docker compose tool. The first container contains a PostgreSQL database with all the necessary extensions. The second container is built on top of Apache Tomcat and includes the IDSM SPARQL engine itself.

To run the containers, please use the following command:

docker compose -f notes/examples/docker/ms/compose.yaml up -d

Once started, the SPARQL endpoint is available at http://localhost:8080/sparql/.

After starting the containers, the database is empty. Please use the following command to load data from ISDB:

docker compose -f notes/examples/docker/ms/compose.yaml exec db /usr/bin/load-isdb.sh

And the following command to load data from MoNA:

docker compose -f notes/examples/docker/ms/compose.yaml exec db /usr/bin/load-mona.sh

Note that you must have at least 64GB of RAM to load the data.