Multivariate Convex Regression at Scale
We present new large-scale algorithms for fitting a multivariate convex regression function to $n$ samples in $d$ dimensions---a key problem in shape constrained nonparametric regression with widespread applications in engineering and the applied sciences. The infinite-dimensional learning task can be expressed via a convex quadratic program (QP) with $O(nd)$ decision variables and $O(n^2)$ constraints. While instances with $n$ in the lower thousands can be addressed with current algorithms within reasonable runtimes, solving larger problems (e.g., $n\approx 10^4$ or $10^5$) are computationally challenging. To this end, we present an active set type algorithm on the Lagrangian dual (of a perturbation) of the primal QP. For computational scalability, we perform approximate optimization of the reduced sub-problems; and propose a variety of randomized augmentation rules for expanding the active set. Although the dual is not strongly convex, we present a novel linear convergence rate of our algorithm on the dual. We demonstrate that our framework can solve instances of the convex regression problem with $n=10^5$ and $d=10$---a QP with 10 billion variables---within minutes; and offers significant computational gains (e.g., in terms of memory and runtime) compared to current algorithms.
May-23-2020
- Country:
- North America > United States > Massachusetts > Middlesex County (0.14)
- Genre:
- Research Report > New Finding (0.46)