Skip to main content

OpenRefine

OpenRefine is a data processing application that allows you to clean up and transform structured data. It has several functionalities suited for creating Linked Open Data (LOD), such as reconciliation, format translation, Resource Description Framework (RDF) mapping, and export options.

OpenRefine and LINCS

Within the LINCS project, OpenRefine is used for data cleaning and reconciliation. It is primarily used by researchers bringing their own datasets to the project. OpenRefine allows for these domain experts to have full control over the changes made to their data.

OpenRefine is best suited for structured data, since it will represent the data in a format similar to a spreadsheet or table. Any file type that follows a similar system, such as comma separated values (CSV), is best, though it is also compatible with other file types like XML, JSON, and RDF. If a researcher’s data falls within a certain domain or is unstructured, a different tool may be more appropriate:

  • Use LINCS-API or NERVE for an unstructured dataset.
  • Use VERSD to reconcile an entirely bibliographic dataset.

The software can be downloaded from OpenRefine’s website. When launched, the application will open in a browser tab that runs locally on your computer.

Though this tool can be useful for researchers and data specialists outside of LINCS, it is important for those who are in the process of getting their data into the LINCS system to begin cleaning and reconciling it in OpenRefine early in the data preparation process.

info

Interested in creating LOD and publishing it in the LINCS triplestore? See Publish Data with LINCS for more information.

Check out the Authority Service to reconcile your data against the LINCS Knowledge Graph from within OpenRefine.

Prerequisites

Users of OpenRefine:

  • Need to come with their own dataset
  • Need a basic understanding of reconciliation and data cleaning
  • Do not need to create a user account

OpenRefine supports the following inputs and outputs:

  • Input: CSV, TSV, XLS, XLSX, JSON, XML, RDF, plain text, and more
  • Output: CSV, TSV, XLS, XLSX, HTML-formatted tables, and more

Resources

To learn more about OpenRefine, see the following resources:

Clean Data:

Reconcile Entities:

Information about the team that developed OpenRefine is available on the Tool Credits page.