Keywords and Defined Terms#

About#

This section is looking at how the keywords could be connected with Defined Terms that point to external vocabularies that follow a vocabulary publishing patterns like at the W3C Best Practice Recipes for Publishing RDF Vocabularies.

The pattern breaks down a bit when attempting to connect with things like the Global Change Master Directory keywords. This impedance is caused by publishing approaches for the terms that don’t align well with the above publishing practices. This does not mean we can not use these terms, rather that we may find multiple ways to connect them used by the community. This can result in some ambiguity in linking in a community.

A person could adapt the pattern to connect things like the Global Change Observing System or EARTH SCIENCE > OCEANS > OCEAN CHEMISTRY. The later of these does have a UUID (6eb3919b-85ce-4988-8b78-9d0018fd8089) but this is not a dereference-able PID.

Note

This topic of keyword linking with DefinedTerms is under review at the Science on Schema work at ESIP. Reference Describing a Dataset for the latest on their recommendations.

Keywords#

The Schema.org keywords property of CreativeWork can point to three different values. These are: DefinedTerm, Text and URL.

We can see the three different approaches here to defining keywords. Here, Region X is a classic text keyword. The other two are defined as a DefinedTerm.

 1{
 2    "@context": {
 3        "@vocab": "https://schema.org/"
 4    },
 5    "@type": "Map",
 6    "@id": "https://example.org/id/XYZ",
 7    "name": "Name or title of the document",
 8    "description": "Description of the map to aid in searching",
 9    "url": "https://www.sample-data-repository.org/creativework/map.pdf",
10    "identifier": {
11        "@id": "https://doi.org/10.5066/F7VX0DMQ",
12        "@type": "PropertyValue",
13        "propertyID": "https://registry.identifiers.org/registry/doi",
14        "value": "doi:10.5066/F7VX0DMQ",
15        "url": "https://doi.org/10.5066/F7VX0DMQ"
16    },
17    "keywords": [
18        "Region X",
19        {
20            "@id": "http://purl.org/dc/dcmitype/Image",
21            "@type": "DefinedTerm",
22            "inDefinedTermSet": "http://purl.org/dc/terms/DCMIType",
23            "termCode": "Image",
24            "name": "Image"
25        },
26        {
27            "@id": "https://www.wikidata.org/wiki/Q350134",
28            "@type": "URL",
29            "url": "https://www.wikidata.org/wiki/Q350134"
30        }
31    ]
32}
Hide code cell source
import json
from rdflib.extras.external_graph_libs import rdflib_to_networkx_multidigraph
from rdflib.extras.external_graph_libs import rdflib_to_networkx_graph
from pyld import jsonld
import graphviz
import os, sys

currentdir = os.path.dirname(os.path.abspath(''))
parentdir = os.path.dirname(currentdir)
sys.path.insert(0, parentdir)
from lib import jbutils

with open("../../../odis-in/dataGraphs/thematics/terms/graphs/map.json") as dgraph:
    doc = json.load(dgraph)

frame = {
  "@context": {"@vocab": "https://schema.org/"},
  "@explicit": "true",
  "@requireAll": "true",
  "@type":     "Map",
  "keywords": ""
}

context = {
    "@vocab": "https://schema.org/",
}

compacted = jsonld.compact(doc, context)

framed = jsonld.frame(compacted, frame)

jbutils.show_graph(framed)
../../_images/c5612ccbe9261e3485e484efec3a1d150bfce3fabee71d6fd31590b6ff148122.svg

Text#

Keywords can be defined as a Text value. This is the most common approach though it doesn’t provide some of the benefits of the other two approaches. For example, it doesn’t allow for terms to be dereferenced on the net or for connects in the graph to be made for common terms by their subject IRIs.

{
  "@context": "https://schema.org/",
  "keywords": [
    "nitrous oxide", 
    "Central Pacific", 
    "headspace equilibration", 
    "SRI Greenhouse Gas Monitoring Gas Chromatograph", 
    "CTD profiler", 
    "Gas Chromatograph"
  ]
}

Note

Be sure to use the [] notation to define the keyword. This defined an array of items vs a single items. If you use an approach like {“term1, term2, term4”} you have only created a single text string with comma separated values. However that is viewed as a single string in the graph. The [] notation creates an array of strings all connected to the subject IRI by the property keywords.

URL#

Keywords can also point to a URL. This provides a way to link to a vocabulary entry that defines the term. This approach has some benefits of linking to more details but does easily provide an easy descriptive text for humans. There is nothing preventing putting in a text keyword followed up by another entry with a related URL.

DefinedTerm#

This is the most complex approach. Keywords can point to a DefinedTerm as defined in a DefinedTermSet pointed to by the property inDefinedTermSet. It does offer the ability to present both a human focused textual name and description of the term. This is a great way to link to a vocabulary entry that defines the term. It also allows for a URL to be used to link to the vocabulary entry. While this approach is the most comprehensive, it does incur a complexity during the query process to extract and present the information.

Defined Terms#

During generation of the structured data a provide may wish to either use or publish a set of controlled vocabulary terms or a similar set.

Within schema.org this could be done by leveraging the “DefinedTerm” amd “DefinedTermSet” types.

These types allow us both to define a set of terms and use a set of terms in describing a thing.

Note that DefinedTerm is an intangible and can connect to most types in Schema.org. So we can use them in places such as:

The following example is from the Schema.org DefinedTermSet reference.

 1[
 2        {
 3                "@context": {
 4                        "@vocab": "https://schema.org/"
 5                }
 6        },
 7        {
 8                "@type": "DefinedTermSet",
 9                "@id": "http://openjurist.org/dictionary/Ballentine",
10                "name": "Ballentine's Law Dictionary",
11                "description": "A description of Ballentine's Law Dictionary Term Set"
12        },
13        {
14                "@type": "DefinedTerm",
15                "@id": "http://openjurist.org/dictionary/Ballentine/term/calendar-year",
16                "name": "calendar year",
17                "description": "The period from January 1st to December 31st, inclusive, of any year.",
18                "inDefinedTermSet": {
19                        "@type": "DefinedTermSet",
20                        "@id": "http://openjurist.org/dictionary/Ballentine"
21                }
22        },
23        {
24                "@type": "DefinedTerm",
25                "@id": "http://openjurist.org/dictionary/Ballentine/term/schema",
26                "name": "schema",
27                "description": "A representation of a plan or theory in the form of an outline or model.",
28                "inDefinedTermSet": {
29                        "@type": "DefinedTermSet",
30                        "@id": "http://openjurist.org/dictionary/Ballentine"
31                }
32        }
33]
Hide code cell source
import json
from rdflib.extras.external_graph_libs import rdflib_to_networkx_multidigraph
from rdflib.extras.external_graph_libs import rdflib_to_networkx_graph
from pyld import jsonld
import graphviz
import os, sys

currentdir = os.path.dirname(os.path.abspath(''))
parentdir = os.path.dirname(currentdir)
sys.path.insert(0, parentdir)
from lib import jbutils

with open("../../../odis-in/dataGraphs/thematics/terms/graphs/map.json") as dgraph:
    doc = json.load(dgraph)

context = {
    "@vocab": "https://schema.org/",
}

compacted = jsonld.compact(doc, context)
jbutils.show_graph(compacted)
../../_images/f8aa419e206d591de22d7870302779d7bea584caccedded045c03a90e30e059b.svg

References#