Validate and Enhance
Introduction
In this step, you check that your transformed data meets the ontological and LOD standards needed for its inclusion in the LINCS Knowledge Graph.
Now that you have transformed your data into RDF, we can validate and enhance your data in the same way regardless of what transformation workflow you followed.
Resources Needed
This step is a joint effort between LINCS and your research team. Your team should make an initial attempt at validating and enhancing your transformed data. When you think it is ready for the LINCS Knowledge Graph, or if you need help before that point, send your transformed data to LINCS and we will do an additional review of the data.
Some basic programming experience (e.g., undergraduate level Python) can make this step easier. LINCS has also made some common validation and enhancement steps easier with the tools discussed below.
The time needed for this step depends on how ready your data is when it comes out of the Implement Conceptual Mapping step. Sometimes there are no errors to fix and it is only a matter a few hours of checking the data and minting entity Uniform Resource Identifiers (URIs). Other times you will find errors that trace back to your original data or to a certain transformation step and will need to spend a few weeks consulting with your team, making edits, and re-checking the data.
Research Team | Ontology Team | Transformation Team | Storage Team | |
---|---|---|---|---|
Handle Data Changes | ✓ | ✓ | ||
Validate and Enhance Transformed Data | ✓ | ✓ | ✓ | ✓ |
Enhance Transformed Data | ✓ | ✓ | ✓ | ✓ |
Use Tools | ✓ | ✓ | ✓ | ✓ |
Handle Data Changes
If you find errors in this step and want to change your data, you have a few options:
- Change the RDF directly by editing the TTL file or by using the editing software of your choosing.
- For small changes, this could be done by hand.
- For bulk changes, we recommend writing a simple script to make the changes or to use our Linked Data Enhancement API.
- Remember that if you make changes to the RDF by hand, then you should not re-run the transformation workflow on the same data or your manual changes may be overwritten.
- Make notes of the changes needed and wait to implement those changes until the data is in ResearchSpace.
- Make the changes to the source data or the transformation step that introduced the error and rerun the transformation workflow until the errors are gone.
Validate Transformed Data
Below are validation steps you should perform on your transformed data. It is best to do these checks on a combined version of your data where all of the triples are in a single TTL file so that you know if there is missing information or a logical inconsistency across the entire dataset.
Entity Labels
Requirements:
- Every URI in your data must have at least one an
rdfs:label
value.- The exception is if a URI is being used only as the object of an
owl:sameAs
relationship or is ofrdf:type crm:E73_Information_Object
. In these cases, the URI is being used as a link to an external source and is not meant to represent a searchable entity in ResearchSpace. - When using external vocabulary terms in your data, add
rdfs:label
andrdf:type
values for those terms from their source vocabularies to your data so that you can query your data without needing to pull from additional sources at the time of querying.
- The exception is if a URI is being used only as the object of an
Suggestions:
- You can add additional
rdfs:label
values for a single entity. - You can add additional labels using
skos:altLabel
orskos:prefLabel
to specify that there is a label or preference that is specific to your project. - Whenever possible, include at least one English
rdfs:label
and one Frenchrdfs:label
. - Whenever possible, add a language tag for each label and literal value (e.g.,
"label"@en
and"étiquette"@fr
). - Try to use the same label formats as are used in existing LINCS data. See LINCS LOD Style Guide (Coming Soon) which will for formats for such labels.
Entity Types
Requirements:
- Every URI in your data must have at least one
rdf:type
value declared.- The exception is if a URI is being used only as the object of an
owl:sameAs
relationship.
- The exception is if a URI is being used only as the object of an
- The rest of the guidelines here will be specific to the conceptual mapping developed for your data.
URI Validation
Requirements:
- Verify that the URIs in your data follow the correct format for each URI source. See common mistakes in the URIs and Prefixes section of our Data Cleaning Guide.
Ontological Validation
- Verify that the relationships present in your transformed data match the mappings created in your Develop Conceptual Mapping step.
- Check back soon for details on future LINCS tools to help you validate CIDOC CRM data.
Logical Validation
- Manually look through your data and follow some relationship paths through the graph. This can help you do a quick sanity check on the data and spot errors.
- A common mistake is having a single URI accidentally representing multiple unique entities in the data, or having too specific of an entity when it could have been labelled to be more general and connect to multiple entities.
- Either through manual checks or using SPARQL queries, see if there are relationships that contradict one another. For example, conflicting family relationships or personal relationships to oneself.
- Check back soon for details on future LINCS tools to help you check for logical inconsistencies in your data.
Enhance Transformed Data
Requirements:
- If you have not already done so, add matched entity values into the data. For details see our Entity Matching Guide
- Remember to match entities against the LINCS Knowledge Graph as much as you can. This will increase the number of connections between your project’s data and other’s. It will also allow you to see yours and other’s contributions to the view of an entity.
- Before LINCS publishes your final data, all URIs in your data must be official LOD URIs from external sources, from your project, or be minted by LINCS. The Transformation Team will help you with minting.
Suggestions:
- If desired to take advantage of ResearchSpace’s map visualizations, add coordinates for geographic locations such as GeoNames URIs, following the format from our Data Cleaning Guide.
- Query the authorities from which you got external URIs to get additional entity labels and any additional information you want included in your dataset.
Use Tools
Use LINCS’s tools to find inconsistencies in your data.
Linked Data Enhancement API
LINCS has bundled common post-processing functionalities so they can be easily applied to any LINCS-compliant RDF data. The functionalities include enhancing the data with entity matching results, enhancing the data with labels from external LOD sources, minting URIs for entities, and validating the structure of the RDF. For more information, see the Linked Data Enhancement API Documentation.
ResearchSpace
When the Transformation Team and your research team agree that the data is ready for ingestion into LINCS, the transformed data is handed over to the Storage Team to be uploaded to the LINCS triplestore as a trial. The review environment for the publishing platform ResearchSpace — called ResearchSpace Review — can then be used as a tool to further validate and improve the data. See the Publish LOD step for details about using ResearchSpace as the final transformation step.
SPARQL Queries
You can load your transformed data into a local triplestore or wait for LINCS to load it into the LINCS triplestore. Once in a triplestore, you can use SPARQL queries to find inconsistencies in your data.