Skip to main content

Semi-Structured Data

Semi-Structured Data is typically XML documents where there is some structure but not in a way that makes it easy to extract entities and relationships without manual work. For example, an XML document that contains natural language text that has been highly annotated with XML tags may be considered semi-structured data. These tags may identify some entities and some relationships between entities that could be turned into Linked Open Data (LOD) using a combination of custom scripts, additional manual annotation, and vetting.

Examples

  • Simplified excerpt from the Orlando Project data, which started its conversion process as hand-annotated XML documents
<DATE>By March 1643</DATE>, early in this year of fierce <TOPIC>Civil War</TOPIC> fighting <NAME>Dorothy Osborne</NAME>'s mother moved with her children from <PLACE>Chicksands</PLACE> to the fortified port of <PLACE>St Malo</PLACE>.