Goto

Collaborating Authors

 rockafellar




Convex Regression with a Penalty

Lim, Eunji

arXiv.org Machine Learning

A common way to estimate an unknown convex regression function $f_0: Ω\subset \mathbb{R}^d \rightarrow \mathbb{R}$ from a set of $n$ noisy observations is to fit a convex function that minimizes the sum of squared errors. However, this estimator is known for its tendency to overfit near the boundary of $Ω$, posing significant challenges in real-world applications. In this paper, we introduce a new estimator of $f_0$ that avoids this overfitting by minimizing a penalty on the subgradient while enforcing an upper bound $s_n$ on the sum of squared errors. The key advantage of this method is that $s_n$ can be directly estimated from the data. We establish the uniform almost sure consistency of the proposed estimator and its subgradient over $Ω$ as $n \rightarrow \infty$ and derive convergence rates. The effectiveness of our estimator is illustrated through its application to estimating waiting times in a single-server queue.


Randomized Transport Plans via Hierarchical Fully Probabilistic Design

Y., Sarah Boufelja, Quinn, Anthony, Shorten, Robert

arXiv.org Machine Learning

An optimal randomized strategy for design of balanced, normalized mass transport plans is developed. It replaces -- but specializes to -- the deterministic, regularized optimal transport (OT) strategy, which yields only a certainty-equivalent plan. The incompletely specified -- and therefore uncertain -- transport plan is acknowledged to be a random process. Therefore, hierarchical fully probabilistic design (HFPD) is adopted, yielding an optimal hyperprior supported on the set of possible transport plans, and consistent with prior mean constraints on the marginals of the uncertain plan. This Bayesian resetting of the design problem for transport plans -- which we call HFPD-OT -- confers new opportunities. These include (i) a strategy for the generation of a random sample of joint transport plans; (ii) randomized marginal contracts for individual source-target pairs; and (iii) consistent measures of uncertainty in the plan and its contracts. An application in algorithmic fairness is outlined, where HFPD-OT enables the recruitment of a more diverse subset of contracts -- than is possible in classical OT -- into the delivery of an expected plan. Also, it permits fairness proxies to be endowed with uncertainty quantifiers.


An Efficient Semi-smooth Newton Augmented Lagrangian Method for Elastic Net

Boschi, Tobia, Reimherr, Matthew, Chiaromonte, Francesca

arXiv.org Machine Learning

Feature selection is an important and active research area in statistics and machine learning. The Elastic Net is often used to perform selection when the features present non-negligible collinearity or practitioners wish to incorporate additional known structure. In this article, we propose a new Semi-smooth Newton Augmented Lagrangian Method to efficiently solve the Elastic Net in ultra-high dimensional settings. Our new algorithm exploits both the sparsity induced by the Elastic Net penalty and the sparsity due to the second order information of the augmented Lagrangian. This greatly reduces the computational cost of the problem. Using simulations on both synthetic and real datasets, we demonstrate that our approach outperforms its best competitors by at least an order of magnitude in terms of CPU time. We also apply our approach to a Genome Wide Association Study on childhood obesity.


Stochastic Online Learning with Probabilistic Graph Feedback

Li, Shuai, Chen, Wei, Wen, Zheng, Leung, Kwong-Sak

arXiv.org Machine Learning

We consider a problem of stochastic online learning with general probabilistic graph feedback. Two cases are covered. (a) The one-step case where for each edge $(i,j)$ with probability $p_{ij}$ in the probabilistic feedback graph. After playing arm $i$ the learner observes a sample reward feedback of arm $j$ with independent probability $p_{ij}$. (b) The cascade case where after playing arm $i$ the learner observes feedback of all arms $j$ in a probabilistic cascade starting from $i$ -- for each $(i,j)$ with probability $p_{ij}$, if arm $i$ is played or observed, then a reward sample of arm $j$ would be observed with independent probability $p_{ij}$. Previous works mainly focus on deterministic graphs which corresponds to one-step case with $p_{ij} \in \{0,1\}$, an adversarial sequence of graphs with certain topology guarantees or a specific type of random graphs. We analyze the asymptotic lower bounds and design algorithms in both cases. The regret upper bounds of the algorithms match the lower bounds with high probability.


Dual Smoothing and Level Set Techniques for Variational Matrix Decomposition

Aravkin, Aleksandr Y., Becker, Stephen

arXiv.org Machine Learning

We focus on the robust principal component analysis (RPCA) problem, and review a range of old and new convex formulations for the problem and its variants. We then review dual smoothing and level set techniques in convex optimization, present several novel theoretical results, and apply the techniques on the RPCA problem. In the final sections, we show a range of numerical experiments for simulated and real-world problems.


Sparse/Robust Estimation and Kalman Smoothing with Nonsmooth Log-Concave Densities: Modeling, Computation, and Theory

Aravkin, Aleksandr Y., Burke, James V., Pillonetto, Gianluigi

arXiv.org Machine Learning

We introduce a class of quadratic support (QS) functions, many of which play a crucial role in a variety of applications, including machine learning, robust statistical inference, sparsity promotion, and Kalman smoothing. Well known examples include the l2, Huber, l1 and Vapnik losses. We build on a dual representation for QS functions using convex analysis, revealing the structure necessary for a QS function to be interpreted as the negative log of a probability density, and providing the foundation for statistical interpretation and analysis of QS loss functions. For a subclass of QS functions called piecewise linear quadratic (PLQ) penalties, we also develop efficient numerical estimation schemes. These components form a flexible statistical modeling framework for a variety of learning applications, together with a toolbox of efficient numerical methods for inference. In particular, for PLQ densities, interior point (IP) methods can be used. IP methods solve nonsmooth optimization problems by working directly with smooth systems of equations characterizing their optimality. The efficiency of the IP approach depends on the structure of particular applications. We consider the class of dynamic inverse problems using Kalman smoothing, where the aim is to reconstruct the state of a dynamical system with known process and measurement models starting from noisy output samples. In the classical case, Gaussian errors are assumed in the process and measurement models. The extended framework allows arbitrary PLQ densities to be used, and the proposed IP approach solves the generalized Kalman smoothing problem while maintaining the linear complexity in the size of the time series, just as in the Gaussian case. This extends the computational efficiency of classic algorithms to a much broader nonsmooth setting, and includes many recently proposed robust and sparse smoothers as special cases.


Agglomerative Bregman Clustering

Telgarsky, Matus, Dasgupta, Sanjoy

arXiv.org Machine Learning

This manuscript develops the theory of agglomerative clustering with Bregman divergences. Geometric smoothing techniques are developed to deal with degenerate clusters. To allow for cluster models based on exponential families with overcomplete representations, Bregman divergences are developed for nondifferentiable convex functions.


Statistical Consistency of Finite-dimensional Unregularized Linear Classification

Telgarsky, Matus

arXiv.org Machine Learning

Binary linear classification operates as follows: obtain a new instance, determine a set of real-valued features, form their weighted combination, and output a label which is positive iff this combination is nonnegative. The interpretability, empirical performance, and theoretical depth of this scheme have all contributed to its continued popularity (Freund and Schapire, 1997, Friedman et al., 2000, Caruana and Niculescu-Mizil, 2006). In order to obtain the coefficients in the above weighting, convex optimization is typically employed. Specifically, rather than just trying to pick the weighting which makes the fewest mistakes over a finite sample -- which is computationally intractable -- consider instead paying attention to the amount by which these combinations clear the zero threshold, a quantity called the margin. Applying a convex penalty to these margins yields a convex optimization procedure, specifically one which can be specialized into both logistic regression and AdaBoost.