Helping Novices Avoid the Hazards of Data: Leveraging Ontologies to Improve Model Generalization Automatically with Online Data Sources

AI Magazine 

The infrastructure and tools necessary for largescale data analytics, formerly the exclusive purview of experts, are increasingly available. Whereas a knowledgeable data miner or domain expert can rightly be expected to exercise caution when required (for example, around misleading conclusions supposedly supported by the data), the nonexpert may benefit from some judicious assistance. This article describes an end-to-end learning framework that allows a novice to create models from data easily by helping structure the model-building process and capturing extended aspects of domain knowledge. By treating the whole modeling process interactively and exploiting high-level knowledge in the form of an ontology, the framework is able to aid the user in a number of ways, including in helping to avoid pitfalls such as data dredging. Prudence must be exercised to avoid these hazards as certain conclusions may only be supported if, for example, there is extra knowledge that gives reason to ...