Transform More Data
Introduction
Once you have made it through a full transformation workflow and published your data in ResearchSpace, what happens when you have more data to add? You have a few options depending on what is new about the new data.
Transformation Options
Edit your Data in ResearchSpace Review
If your new data is a relatively small number of new entities or connections between them, then editing directly in ResearchSpace Review is a good option.
ResearchSpace Review is the ResearchSpace environment with additional features for editing data and editing the project's landing page.
Register for a ResearchSpace Review account to obtain editing permissions.
When you edit your data directly in ResearchSpace Review, it creates a new version that diverges from your original untransformed source data. If you want those changes to apply to versions of your data outside of ResearchSpace, you will need to apply the same changes there.
If you plan on continuing to make significant changes to your original data and want those changes to appear in ResearchSpace, contact LINCS to discuss options.
Rerun a Transformation Workflow on New Data
This is the case where you have a new batch of data that follows the same structure and contains the same relationships as your originally transformed batch.
- Structured Data
- Semi-Structured Data
- TEI Data
- Natural Language Data
An example here would be having new rows for the spreadsheet you transformed originally.
Here is how each step will need to change and be repeated:
- Export Data
- Repeat the same process.
- Clean Data
- If you used a script, then run it on the new data.
- If you made manual changes, you need to apply those to the new data. Note that if you used OpenRefine, you may be able to open the original project and export the change history from the Undo/Redo tab.
- Match Entities
- Any entities that did not appear in the first batch needs to be matched entities externally.
- Any entities that appear in the first batch and the new batch need to be matched against one another (i.e., use the same identifier for the same entity in both batches).
- Develop Conceptual Mapping
- Because the structure of your data has not changed, you can reuse the same conceptual mapping from your original Develop Conceptual Mapping step.
- Implement Conceptual Mapping
- The script or template you used to implemented that conceptual mapping will need to be rerun.
- If you used 3M, then you only need to replace the input file for your 3M mapping project with your new data and hit run in 3M.
- Validate and Enhance
- Again, either the script you used or the manual changes you made will need to be repeated.
The repeatability of this workflow varies because this workflow looks different for every project.
If you know you will continue to have new data of the same format, then ideally you will setup a repeatable set of steps that use automated scripts or templates. If that is not possible, then you will have to repeat the more manual process for each new batch of data.
This process is likely to take the same amount time as the original transformation.
New TEI data containing the same relationships can follow the same LINCS XTriples workflow as the first batch. Similar to the Structured Data Workflow, the rest of the steps will need to be redone—either by rerunning scripts, reusing the same tools, or redoing manual work.
This process is likely to take the same amount time as the original transformation.
New natural language data containing the same relationships can follow the same workflow as the first batch. Similar to the Structured Data Workflow, the rest of the steps will need to be redone—either by rerunning scripts, reusing the same tools, or redoing manual work.
For new data, you still need to match entities against external sources, but now you also need to match entities against your already transformed data.
Run a New Transformation Workflow on New Data
If you have new data that does not have the same starting structure as the data you originally transformed, or if it contains many new relationships, then you will need to repeat the appropriate transformation workflow on the new data. You may be able to use an edited version of your original transformation workflow if there are similarities between the batches of data.
Publication Options
Your newly transformed data can be combined with data you already have in ResearchSpace so that it appears as a single project and named graph in the LINCS triplestore.
Alternatively, if it covers new subject matter and is part of a different research project, it can be published as a separate project in ResearchSpace and be stored in a different named graph.