An Aggregate and Iterative Disaggregate Algorithm with Proven Optimality in Machine Learning
Park, Young Woong, Klabjan, Diego
In this paper, we propose a clustering-based iterative algorithm to solve certain optimization problems in machine learning when data size is large and thus it becomes impractical to use out-of-the-box algorithms. We rely on the principle of data aggregation and then subsequent disaggregations. While it is standard practice to aggregate the data and then calibrate the machine learning algorithm on aggregated data, we embed this into an iterative framework where initial aggregations are gradually disaggregated to the extent that even an optimal solution is obtainable. Early studies in data aggregation consider transportation problems [1, 10], where either demand or supply nodes are aggregated. Zipkin [31] studied data aggregation for linear programming (LP) and derived error bounds of the approximate solution.
Jul-5-2016