Goto

Collaborating Authors

 numerical attribute


Non-Mathematical Feature Engineering techniques for Data Science

#artificialintelligence

"Apply Machine Learning like the great engineer you are, not like the great Machine Learning expert you aren't." This is the first sentence in a Google-internal document I read about how to apply ML. In my limited experience working as a server/analytics guy, data (and how to store/process it) has always been the source of most consideration and impact on the overall pipeline. Ask any Kaggle winner, and they will always say that the biggest gains usually come from being smart about representing data, rather than using some sort of complex algorithm. Even the CRISP data mining process has not one, but two stages dedicated solely to data understanding and preparation.


Correlation-Based Refinement of Rules with Numerical Attributes

AAAI Conferences

Learning rules is a common way of extracting usefulinformation from knowledge or data bases. Many ofsuch data sets contain numerical attributes. However,approaches like ILP or association rule mining are optimizedfor data with categorical values, and consideringnumerical attributes is expensive. In this paper,we present an extension to the top-down ILP algorithm,which enables an efficient discovery of datalogrules from data with both numerical and categorical attributes.Our approach comprises a preprocessing phasefor computing the correlations between numerical andcategorical attributes, as well as an extension to the ILPrefinement step, which enables us to detect interestingcandidate rules and to suggest refinements with relevantattribute combinations. We report on experiments withU.S. Census data, Freebase and DBpedia, and show thatour approach helps to efficiently discover rules with numericalintervals.