Ontologies
In order to make Linked Open Data (LOD) as consistent and precise as possible, it must conform to an ontology. An ontology creates rules for your data, setting out exactly what types of entities and relationships can exist in your data. Ontologies make the links between your data points semantic, which means that a computer can read and understand the specific meaning of the relationships.
Ontologies as Worldviews
In metaphysics, ontology is the philosophical study of being, such as what pieces fit together to make up the world and how these pieces interrelate. In the information sciences, ontologies encode a conceptual model—an abstracted way of thinking—in a machine-readable way. More specifically, for LOD, a formal ontology is a set of classifications that define which subjects, objects, and predicates can be used to describe a given dataset.
It is important to understand that all ontologies present a worldview. By specifying what is allowed to belong in a domain and range, an ontology determines what a category is and how different entities can be related to each other. Therefore, ontologies can be said to create just as much as they describe. Our data, and the decisions we make around how we choose to model and represent it, are connected to systems of power, agency, and control. LINCS is just as interested in how the structures we use bring meaning to the data as we are in linking data together.
LINCS aims to adopt ontologies that:
- Support nuance, difference, and diversity, including diverse ways of knowing
- Reflect an understanding of identity categories and social classification as culturally produced, intersecting, and discursive
- Provide context and provenance of assertions
- Mitigate risk of (re-)coding cissexism, racism, colonialism, heteronormativity, ableism, classism, and other forms of discrimination into data
For more about how LINCS uses ontologies, see the Ontologies Adoption and Development Policy.
Mapping
Fitting a dataset to an ontology is called mapping. Mapping forms the data into a structure that allows for consistent representation of all the things described by the dataset and the relationships between them. Some ontologies are meant to describe a general discipline or area (e.g., CIDOC CRM covers cultural heritage data). Others, often called domain ontologies, are custom-built for more specific subject matter.
Mapping can be a complicated process. When LINCS transforms datasets into LOD, the Ontology Team drafts a conceptual mapping based on the structure of the dataset being transformed. If possible, LINCS will adapt an existing mapping if there is a dataset in the triplestore that has already been transformed and shares a similar structure and content to the new dataset. Datasets that adhere to an ontology tend to be more consistent and accurate in their description of their data than datasets that have not yet been mapped. This consistency allows for the creation of richer and more useful LOD, especially when a widely adopted ontology is used.
Orientations
Most ontologies are either oriented towards objects or towards events. Object-driven ontologies place objects at the centre of their structure and connect together pieces of information that are attributes of those objects. For example, a book could be modelled as a central object, with its author and publication date as connected attributes:
Event-driven ontologies place events at the centre of their structure and describe the results or outputs of these events. For example, a book could be modelled as a product of a creation event, with the author and publication date as aspects of that event:
CIDOC CRM
LINCS uses CIDOC CRM and its extensions as its core ontology. CIDOC CRM is the Conceptual Reference Model (CRM) of the International Council of Museums Committee for Documentation. CIDOC CRM is an event-driven ontology designed to model cultural heritage data and was originally developed for use by museums and archives.
CIDOC CRM contains a list of entities (i.e., subjects) and properties (i.e., predicates). Entities are identified by the “E” prefix and properties by the “P” prefix. The core CIDOC CRM ontology consists of approximately eighty entities that can be connected through roughly two hundred properties.
Entities and properties are organized in a hierarchy, with each subentity or subproperty being more specific than its superentity or superproperty. The following diagram describes an artwork within CIDOC CRM’s event-driven ontology. The ontology places the event of the artwork’s production at the centre (E12). E22 and E57 are subentities of E12. P45 is a subproperty of P106.
As an event-centric ontology, in CIDOC CRM, all knowledge comes from events. The following is a high-level diagram of the ontology.
This diagram depicts the following:
- Actors (E39 Actor) can participate in events (E2 Temporal Entity).
- Events happen at a time (E52 Time-Span) and at a place (E53 Place).
- Events can affect things.
- Things can be physical (E24 Physical Human-Made Thing) or conceptual (E28 Conceptual Object).
- Things can have locations (E53 Place).
- Things can be categorized in many different groups (E55 Type).
- Things can have any number of identifiers (E41 Appellation).
Using an ontology like CIDOC CRM to organize your data makes it more compatible with other data. Compatibility makes your data more meaningful, reusable, and sustainable.
Extensions
CIDOC CRM’s extensions allow the ontology to cover many more specific knowledge domains, when needed. The adaptability provided by the extensions is helpful for LINCS, since LINCS datasets, while all belonging under the umbrella of cultural heritage, come from a range of disciplines:
- CRMarchaeo: Archaeological excavation
- CRMba: Archaeological buildings
- CRMdig: Provenance information
- CRMgeo: Spatiotemporal properties
- CRMinf: Argumentation
- CRMsci: Scientific observation
- CRMsoc: Social phenomena
- CRMtex: Ancient texts
- FRBRoo/LRMoo: Bibliographic information
- PRESSoo: Journals and periodicals
- DoReMus: Music
Web Annotation Model (WADM)
LINCS sometimes uses the Web Annotation Data Model (WADM) in conjunction with CIDOC CRM. WADM is a standard for formatting and structuring web annotations. It was designed to make descriptions of web annotations consistent and therefore more shareable and reusable across different sources. LINCS uses WADM to describe the sources used in the datasets that are being transformed.
WADM is not an extension of CIDOC CRM, nor is it event-centric by nature. While there is no pre-existing alignment between WADM and CIDOC CRM, ontologists are working to make WADM and CIDOC CRM work together. For example, at LINCS, annotations are classified as a crm:E33_Linguistic_Object in CIDOC CRM.
Web Ontology Language (OWL)
At LINCS, we use Web Ontology Language (OWL) to express our ontologies so they can be understood by computers. OWL is like a language (or perhaps better, grammar) for ontologies. It is used to represent complex relationships between categories of things and their properties.
In the following diagram, OWL can be used to express:
- “P1 is identified by” is a property.
- The domain of “P1 identified by” is “E21 Person.”
- The range of “P1 is identified by” is “E41 Appellation.”
- The opposite (flipped) relationship of “P1 is identified by” is “P1i identifies.”
- The English label for this property is “is identified by.”
Below is what OWL encoding actually looks like:
<owl:ObjectProperty rdf:about="http://www.cidoc-crm.org/cidoc-crm/P1_is_identified_by">
<rdfs:domain rdf:resource="http://www.cidoc-crm.org/cidoc-crm/E21_CRM_Entity"/>
<rdfs:range rdf:resource="http://www.cidoc-crm.org/cidoc-crm/E41_Appellation"/>
<owl:inverseOf rdf:resource="http://www.cidoc-crm.org/cidoc-crm/P1i_identifies"/>
<rdfs:label xml:lang="en">is identified by</rdfs:label>
</owl:ObjectProperty>
Summary
- In information sciences, ontologies encode a conceptual model in a machine-readable way.
- All ontologies present a worldview.
- Mapping forms the data into a structure that allows for consistent representation of all things and the relationships between them.
- Most ontologies are either object-driven or event-driven.
- CIDOC CRM is an event-driven ontology designed to model cultural heritage data.
- CIDOC CRM’s extensions allow it to cover many more knowledge domains.
- Web Annotation Data Model (WADM) is a standard for the formatting and structuring of web annotations.
- Web Ontology Language (OWL) is a grammar that is used to express ontologies.
Resources
To learn more about ontologies, see the following resources:
- Bruseker (2019) “Learning Ontology & CIDOC CRM”
- Bruseker & Guillem (2021) “CIDOC CRM Game”
- Bruseker (2022) “Formal Ontologies: A Complete Novice’s Guide"
- CIDOC CRM (2022) “Compatible Models & Collaborations”
- CIDOC CRM (2022) “What is the CIDOC CRM?”
- Guarino, Oberle, & Staab (2009) “What Is an Ontology?”
- Noy & McGuinness (2001) “Ontology Development 101: A Guide to Creating Your First Ontology”