OIH SPARQL#

About#

This page will hold some information about the SPARQL queries we use and how they connect with some of the profile guidance in this document. We will show how this relates to and depends on the Gleaner prov as well as the Authoritative Reference elements of the patterns. It is expected that the Gleaner prov will be present, though this can be made optional in case other indexing systems are used that do not provide this prov shape. The SPARQL will be looking for both Gleaner prov and the Authroitative Reference elements.

This will be different for different patterns. For example, it might relate to the publisher provider elements for Creativeworks, but to the identity element for People and Organizations.

 1prefix prov: <http://www.w3.org/ns/prov#>
 2PREFIX con: <http://www.ontotext.com/connectors/lucene#>
 3PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
 4PREFIX con-inst: <http://www.ontotext.com/connectors/lucene/instance#>
 5PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
 6PREFIX schema: <https://schema.org/>
 7PREFIX schemaold: <http://schema.org/>
 8PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 9
10SELECT DISTINCT ?g  ?s  ?wat ?orgname ?domain ?type ?score ?name ?url ?lit ?description ?headline
11WHERE {
12   ?lit bds:search "coral" .
13   ?lit bds:matchAllTerms "false" .
14   ?lit bds:relevance ?score .
15   graph ?g {
16    ?s ?p ?lit .
17    ?s rdf:type ?type . 
18    OPTIONAL { ?s schema:name ?name .   }
19    OPTIONAL { ?s schema:headline ?headline .   }
20    OPTIONAL { ?s schema:url ?url .   }
21    OPTIONAL { ?s schema:description ?description .    }
22  }
23   ?sp prov:generated ?g  .
24   ?sp prov:used ?used .
25   ?used prov:hadMember ?hm .
26   ?hm prov:wasAttributedTo ?wat .
27   ?wat rdf:name ?orgname .
28   ?wat rdfs:seeAlso ?domai
29}
30ORDER BY DESC(?score)
31LIMIT 30
32OFFSET 0

Lines 12-14#

It should be noted that the above SPARQL is not standards compliant. It leverages some vender specific syntax that is not part of the SPARQL standard. This is not uncommon as groups will often add their own syntax to offer additional functionality.

A common one is what is seen here where a full text index is leveraged to allow for more complex and faster searches than can be done with FILTER regex. These three lines will only work in the current OIH triplestore (Blazegraph). Other triplestores like Jena and other do similar built in function extensions.

Lines 18-21#

These line demonstrate the use of the OPTIONAL keyword. These triples are not required to be present in a resource. If they are, they will be returned.

Lines 23-28#

These lines are standard SPARQL but are searching across triples not from the provider graphs. Rather, they are looking at triples generated by the OIH indexing program used, Gleaner.

Note, that Gleaner is not a dependency of this project and other indexing approaches and software could be used. As pointed out in the documentation, this approach is based on structured data on the web and web architecture approaches. So, any indexing system following those approaches can be used.

These triples are used to track the indexing event and the sources indexed. It provides some additional provenance to the information collected, but does not change or even extend what the providers are publishing.

As such, these statements could be removed and all that would be lost of indexing activity information.

Lines 30-32#

These lines represent three specific SPARQL parameters.

First is the ORDER BY directive. This is used to order the results by one of the returned variables. In this case we are using the ?score variable which comes from the vendor specific syntax noted in lines 11-13. This score is the ranking score for a resource search against the full text index. However, this could be any variable coming from standards compliant SPARQL calls too. Sorting can be done on alphanumeric values in ascending (ASC) or descending (DESC) order.

The LIMIT is used to limit the number of results returned. We follow this with, OFFSET which is used to skip the first n results. These two are useful for pagination when combined with the ORDER BY directive.