Goto

Collaborating Authors

 nonconvexity



Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima

Neural Information Processing Systems

We establish theoretical results concerning all local optima of various regularized M-estimators, where both loss and penalty functions are allowed to be nonconvex. Our results show that as long as the loss function satisfies restricted strong convexity and the penalty function satisfies suitable regularity conditions, any local optimum of the composite objective function lies within statistical precision of the true parameter vector. Our theory covers a broad class of nonconvex objective functions, including corrected versions of the Lasso for errors-in-variables linear models; regression in generalized linear models using nonconvex regularizers such as SCAD and MCP; and graph and inverse covariance matrix estimation. On the optimization side, we show that a simple adaptation of composite gradient descent may be used to compute a global optimum up to the statistical precision epsilon in log(1/epsilon) iterations, which is the fastest possible rate of any first-order method. We provide a variety of simulations to illustrate the sharpness of our theoretical predictions.



Reviews: Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy

Neural Information Processing Systems

Originality: The authors apply the idea that overparametrization induces local linearization, which has been documented for supervised learning, and in another submission for TD learning. In particular, they decompose the error into two terms, one due to TD, and the other due to SGD, and incorporate them in the analysis of infinite-dimensional mirror descent. The insight that the previous previous analysis for TD could be generalised to a meta algorithm that includes both TD and SGD as particular cases is key. Related work is adequately cited, and differences with previous works are clearly stated, including differences with the sister submission [5]. Quality: The submission seems technically sound, and includes detailed proofs (I just skimmed through them). This is a complete piece of work.


Planning Shorter Paths in Graphs of Convex Sets by Undistorting Parametrized Configuration Spaces

arXiv.org Artificial Intelligence

Abstract-- Optimization based motion planning provides a useful modeling framework through various costs and constraints. Using Graph of Convex Sets (GCS) for trajectory optimization gives guarantees of feasibility and optimality by representing configuration space as the finite union of convex sets. Nonlinear parametrizations can be used to extend this technique to handle cases such as kinematic loops, but this distorts distances, such that solving with convex objectives will yield paths that are suboptimal in the original space. We present a method to extend GCS to nonconvex objectives, allowing us to "undistort" the optimization landscape while maintaining feasibility guarantees. We demonstrate our method's efficacy on three different robotic planning domains: a bimanual robot moving an object with both arms, the set of 3D rotations using Euler angles, and a rational parametrization of kinematics that enables certifying regions as collision free. Across the board, our method significantly improves path length and trajectory duration with only a minimal increase in runtime.


Towards Convexity in Anomaly Detection: A New Formulation of SSLM with Unique Optimal Solutions

arXiv.org Artificial Intelligence

An unsolved issue in widely used methods such as Support Vector Data Description (SVDD) and Small Sphere and Large Margin SVM (SSLM) for anomaly detection is their nonconvexity, which hampers the analysis of optimal solutions in a manner similar to SVMs and limits their applicability in large-scale scenarios. In this paper, we introduce a novel convex SSLM formulation which has been demonstrated to revert to a convex quadratic programming problem for hyperparameter values of interest. Leveraging the convexity of our method, we derive numerous results that are unattainable with traditional nonconvex approaches. We conduct a thorough analysis of how hyperparameters influence the optimal solution, pointing out scenarios where optimal solutions can be trivially found and identifying instances of ill-posedness. Most notably, we establish connections between our method and traditional approaches, providing a clear determination of when the optimal solution is unique -- a task unachievable with traditional nonconvex methods. We also derive the {\nu}-property to elucidate the interactions between hyperparameters and the fractions of support vectors and margin errors in both positive and negative classes.


Reviews: On the Optimization Landscape of Tensor Decompositions

Neural Information Processing Systems

Specifically, it studies random over-complete tensors. The associated objective function is nonconvex, yet in practice simple methods based on gradient ascent are observed to solve this problem. This paper proves why we should expect such outcome by showing that there is almost no local maxima other than the global maxima of the problem when the optimization is initialized by any solution that is slightly better than random guess. Importantly, it is shown that these initial points do not have to be close to the true components of the tensor. This is an interesting result and well written paper. The analysis involves two steps: local (points close to true components) and global (point far from true components). The number of local maxima in each case is analyzed and shown to be exactly 2n for the former and almost nonexistent for the latter.


Regularized M estimators with Statistical and algorithmic theory for local optima

Neural Information Processing Systems

We establish theoretical results concerning local optima of regularized M-estimators, where both loss and penalty functions are allowed to be nonconvex. Our results show that as long as the loss satisfies restricted strong convexity and the penalty satisfies suitable regularity conditions, any local optimum of the composite objective lies within statistical precision of the true parameter vector. Our theory covers a broad class of nonconvex objective functions, including corrected versions of the Lasso for errors-in-variables linear models and regression in generalized linear models using nonconvex regularizers such as SCAD and MCP.


On the nonconvexity of some push-forward constraints and its consequences in machine learning

arXiv.org Machine Learning

The push-forward operation enables one to redistribute a probability measure through a deterministic map. It plays a key role in statistics and optimization: many learning problems (notably from optimal transport, generative modeling, and algorithmic fairness) include constraints or penalties framed as push-forward conditions on the model. However, the literature lacks general theoretical insights on the (non)convexity of such constraints and its consequences on the associated learning problems. This paper aims at filling this gap. In a first part, we provide a range of sufficient and necessary conditions for the (non)convexity of two sets of functions: the maps transporting one probability measure to another; the maps inducing equal output distributions across distinct probability measures. This highlights that for most probability measures, these push-forward constraints are not convex. In a second time, we show how this result implies critical limitations on the design of convex optimization problems for learning generative models or group-fair predictors. This work will hopefully help researchers and practitioners have a better understanding of the critical impact of push-forward conditions onto convexity.


Structure Learning with Continuous Optimization: A Sober Look and Beyond

arXiv.org Artificial Intelligence

Bayesian networks are a class of probabilistic graphical models that encode probabilistic distributions in a compact way (Pearl, 1988; Koller and Friedman, 2009). Recovery of their graphical structures from data, represented by directed acyclic graphs (DAGs), has found applications in several fields such as genetics (Peters et al., 2017) and education (Gong et al., 2022). This problem is NP-hard in general (Chickering, 1996; Chickering et al., 2004) owing to the combinatorial space of DAGs. Classical structure learning approaches fall into two broad categories, i.e., constraint-based methods and score-based methods. Constraint-based methods, such as PC (Spirtes and Glymour, 1991), employ conditional independence tests to estimate the skeleton and further perform edge orientation up to the Markov equivalence class (MEC) (Spirtes et al., 2001). Score-based methods typically assign a score to each structure and search for a high-scoring structure in the space of DAGs or equivalence classes (Koivisto and Sood, 2004; Singh and Moore, 2005; Cussens, 2011; Yuan and Malone, 2013). These methods often adopt greedy search because of the large space of possible structures (Chickering, 1996), such as GES (Chickering, 2002) and GDS (Peters and Bühlmann, 2013). Recently, Zheng et al. (2018) proposed a smooth characterization of acyclicity and transformed the structure learning problem of discrete nature into a continuous, nonconvex optimization problem, thus enabling the application of gradient-based methods.