Why Historical Language Is a Challenge for Artificial Intelligence

#artificialintelligence 

One of the central challenges of Natural Language Processing (NLP) systems is to derive essential insights from a wide variety of written materials. Contributing sources for a training dataset for a new NLP algorithm could be as linguistically diverse as Twitter, broadsheet newspapers, and scientific journals, with all the appellant eccentricities unique to each of just those three sources. When an NLP algorithm has to consider material that comes from multiple eras, it typically struggles to reconcile the very different ways that people speak or write across national and sub-national communities, and especially across different periods in history. Yet, using text data (such as historical treatises and venerable scientific works) that straddles epochs is a potentially useful method of generating a historical oversight of a topic, and of formulating statistical timeline reconstructions that predate the adoption and maintenance of metrics for a domain. For example, weather information contributing to climate change predictive AI models was not adequately recorded around the world until 1880, while data-mining of classical texts offers older records of major meteorological events that may be useful in providing pre-Victorian weather data.