Optimal binning: mathematical programming formulation

Navas-Palencia, Guillermo

arXiv.org Machine Learning 

January 23, 2020 Abstract The optimal binning is the optimal discretization of a variable into bins given a discrete or continuous numeric target. We present a rigorous and extensible mathematical programming formulation to solving the optimal binning problem for a binary, continuous and multi-class target type, incorporating constraints not previously addressed. For all three target types, we introduce a convex mixed-integer programming formulation. Several algorithmic enhancements such as automatic determination of the most suitable monotonic trend via a Machine-Learning-based classifier and implementation aspects are thoughtfully discussed. The new mathematical programming formulations are carefully implemented in the open-source python library OptBinning. 1 Introduction Binning (grouping or bucketing) is a technique to discretize the values of a continuous variable into bins (groups or buckets). From a modeling perspective, the binning technique may address prevalent data issues such as the handling of missing values, the presence of outliers and statistical noise, and data scaling. Furthermore, the binning process is a valuable interpretable tool to enhance the understanding of the nonlinear dependence between a variable and a given target while reducing the model complexity. Ultimately, resulting bins can be used to perform data transformations. Binning techniques are extensively used in machine learning applications, exploratory data analysis and as an algorithm to speed up learning tasks; recently, binning has been applied to accelerate learning in gradient boosting decision tree [12].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found