Helping Novices Avoid the Hazards of Data: Leveraging Ontologies to Improve Model Generalization Automatically with Online Data Sources
Janpuangtong, Sasin (Texas A&M University) | Shell, Dylan A. (Texas A&M University)
This article describes an end-to-end learning framework that allows a novice to create models from data easily by helping structure the model building process and capturing extended aspects of domain knowledge. By treating the whole modeling process interactively and exploiting high-level knowledge in the form of an ontology, the framework is able to aid the user in a number of ways, including in helping to avoid pitfalls such as data dredging. We describe how the framework automatically exploits structured knowledge in an ontology to identify relevant concepts, and how a data extraction component can make use of online data sources to find measurements of those concepts so that their relevance can be evaluated. Prediction error on unseen examples of these models show that our framework, making use of the ontology, helps to improve model generalization.