Natural Language Data

Natural Language Data is data that is in a free-text format. For LINCS, this looks like a document of full sentences written in modern English, ideally following common grammatical rules. Natural language data includes documents that are fully plain text—like a written biography saved as a TXT file—or any document of a different format where there is plain text embedded within.


  • Simplified excerpt from the Orlando Project data, where we pulled out natural language text from XML documents
By March 1643, early in this year of fierce Civil War fighting, Dorothy Osborne's mother moved with her children from Chicksands to the fortified port of St Malo.