Vocabularies

A vocabulary is a list of related terms and their definitions, usually for a particular discipline. Vocabularies are often created by groups of people who share a common interest and/or problem. At LINCS, vocabularies are used alongside ontologies to add additional domain-specific meaning to the data.

Taxonomies, thesauri, and controlled vocabularies are specific kinds of vocabularies. Each of these types of vocabularies fills a specific need and is used for particular applications.

Controlled Vocabularies

A controlled vocabulary is a standardized and organized arrangement of words and phrases, which is used to consistently describe data. A controlled vocabulary is not organized in a hierarchy, but it provides concise and consistent language, which is useful for situations that require easy search and retrieval. The Virtual International Authority File (VIAF) is an example of a controlled vocabulary.

Taxonomies

A taxonomy is a vocabulary that organizes concepts based on their characteristics and/or differences. A taxonomies identifies hierarchical relationships. High-level categories are broken down into sub-categories, and then into further sub-categories in a branching but linear fashion, all the way down to individual entities. A taxonomy does not represent relationships across categories, only hierarchically within individual categories (between sub-categories and their super-categories or vice versa). Examples of taxonomies include the Dewey Decimal Classification System (DDC) and Library of Congress Classification System (LCC).

Thesauri

In information systems, a thesaurus is a structured vocabulary that shows basic relationships between concepts: hierarchical, associative, and equivalence. A thesaurus not only provides terms that are broader and narrower than others, but also terms that are synonymous, antonymous, or otherwise related. Examples of thesauri include the Art & Architecture Thesaurus (AAT) and Thesaurus of Geographic Names (TGN).

Vocabularies and Ontologies

Vocabularies and ontologies are related but are not the same. There are a few things about ontologies that are useful to know in order to better understand the role of vocabularies. For an in-depth introduction to ontologies, see the Ontologies page.

Although vocabularies and ontologies are often defined in overlapping ways in the Digital Humanities (DH), the differences between the two are important at LINCS and when working with LOD more generally. For example, consider this triple:

alt="An example of a triple: The subject, Margaret Laurence, is the author of the object, The Stone Angel."

An ontology encodes a conceptual model. It determines the predicate (“is the author of”) and the predicate’s domain and range. The domain represents the subject and the range represents the object in the subject-predicate-object relationship.

The ontology categorizes entities into classes and predicates into properties. The ontology then tells us which classes of entities (subjects) are allowed to relate which classes of entities (objects) via which type of property (predicates). For example, in this triple, the ontology is what tells us that the “Margaret Laurence” entity is allowed to be related to the “The Stone Angel” entity, via the relationship “is the author of.”

In contrast, a vocabulary refines an ontology to give its generic concepts meaning in different domains. For example, in the CIDOC CRM ontology, the class “E41 Appellation” is a class for names. The ontology gives us a list of all of the relationships an entity of the “E41 Appellation” class is allowed to have with other entities of other classes. For example, the class “E21 Person” can be connected to the class “E41 Appellation” via the property “P1 is identified by”:

alt="The domain Person is identified by the range Appellation"

The class “E41 Appellation” may house different things, depending on the domain in which it is used. This is where vocabulary comes in. The vocabulary specifies what kinds of things are allowed in the class "Appellation" for a specific dataset.

These could all be classed as an “E41 Appellation" as applied in different domains or disciplines:

“Kate LeBere” (a real person) in a dataset of LINCS collaborators.
“Confederation Bridge” (a place, landmark, and piece of roadworks infrastructure) in a dataset of city architecture.
“Lives of Girls and Women” (a book and short story) in a dataset of Canadian literature.
“Information Science” (a name of a discipline) in a dataset of university programs.

Keeping the vocabulary separate from the ontology means that LINCS can use different vocabularies to add nuance and meaning to datasets that share the same class in an ontology.

Vocabulary Development

While there are many existing vocabularies, sometimes LINCS needs to create a new vocabulary for a particular group of researchers. For example, the EML Place Types vocabulary was developed in partnership with three DH projects: REED London Online (REED), Map of Early Modern London (MoEML), and The Digital Ark. These projects each had data on locations in early modern London. One feature of the locations that the researchers wanted to capture in their LOD was their “type” (e.g., Blackfriars is a special type of jurisdiction known as a “liberty”). Since there was no existing vocabulary that contained all of the terms the researchers needed, a custom vocabulary was developed.

Below is a visualization of the vocabulary that was developed:

Visualization of the Early Modern London Place Types vocabulary

Whenever possible, it is preferable that researchers use an established vocabulary rather than creating a new one from scratch. Constructing a rich vocabulary is both time consuming and labor intensive. Using established vocabularies also increases the number of connections between LINCS data and established authorities—essential for LINCS’s participation in the larger linked data ecosystem.

If a new vocabulary needs to be developed because there are no suitable alternatives, researchers and LINCS team members will need to collaborate to construct the vocabulary together.

Simple Knowledge Organization System (SKOS)

LINCS uses the Simple Knowledge Organization System (SKOS) to express its vocabularies. SKOS is a standard that provides a way to represent thesauri, taxonomies, and controlled vocabularies following the Resource Description Framework (RDF). SKOS allows LINCS’s vocabularies to link to other vocabularies and RDF datasets.

In SKOS, each vocabulary term is considered to be a concept. Each concept is given a Uniform Resource Identifier (URI) (e.g., the URI for “Bookshop” in the EML vocabulary is http://id.lincsproject.ca/eml/Bookshop). Once concepts are declared, SKOS allows you to declare relationships between concepts, such as hierarchical (skos:broader and skos:narrower) and associative (skos:related). In addition, SKOS gives you the ability to add documentation to your vocabulary, such as general documentation (skos:note), a brief summary of the intended meaning of a concept (skos:scopeNote), and a complete summary of the intended meaning of a concept (skos:definition). At LINCS, all vocabulary terms are defined as instances of the class “E55 Type” in CIDOC CRM, which allows them to be used with the transformed datasets.

Summary

A vocabulary is a list of related terms with associated definitions, usually for a particular discipline.
Vocabularies can take a few different forms, including controlled vocabularies, taxonomies, and thesauri.
There is often overlap in the definition of vocabularies and ontologies in the Digital Humanities (DH), but LINCS considers them to be different terms.
Ontologies categorize different entities into different classes and different predicates into different properties to tell you which classes of entities (subjects) are allowed to relate to which classes of entities (objects) via which type of property (predicates).
Vocabularies refine ontologies to give their generic concepts meaning in different domains.
Sometimes LINCS needs to develop a vocabulary for a particular group of researchers.
LINCS uses the Simple Knowledge Organization System (SKOS) to declare its vocabularies.

Resources

To learn more about vocabularies, see the following resources:

EML Place Types Vocabulary
Linked Open Vocabularies (2022) “Linked Open Vocabularies (LOV)”
SemWebTec (2010) “Controlled Vocabulary vs Ontology”
W3C (2012) “SKOS Simple Knowledge Organization System”
W3C (2015) “What is a Vocabulary?”
Zaytseva (2020) “Controlled Vocabularies and SKOS”

Controlled Vocabularies​

Taxonomies​

Thesauri​

Vocabularies and Ontologies​

Vocabulary Development​

Simple Knowledge Organization System (SKOS)​

Summary​

Resources​