Optimization
Outlier-Robust Sparse Estimation via Non-Convex Optimization
We explore the connection between outlier-robust high-dimensional statistics and non-convex optimization in the presence of sparsity constraints, with a focus on the fundamental tasks of robust sparse mean estimation and robust sparse PCA. We develop novel and simple optimization formulations for these problems such that any approximate stationary point of the associated optimization problem yields a near-optimal solution for the underlying robust estimation task. As a corollary, we obtain that any first-order method that efficiently converges to stationarity yields an efficient algorithm for these tasks. The obtained algorithms are simple, practical, and succeed under broader distributional assumptions compared to prior work.
Review for NeurIPS paper: Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization
Summary and Contributions: Multi-objective optimization (MOO) problems involve more than one objective function that are to be minimized or maximized. For non-trivial instances of MOO problems, no unique solution exists that simultaneously optimizes all objectives. In that case, the aim to identify the set of Pareto optimal solutions of the problem. Bayesian Optimization (BO) approaches rely on acquisition functions (AF), to evaluate promising query points for function evaluations. BO approaches for MOO require to define AFs that are applicable to the notion of Pareto optimality.
Review for NeurIPS paper: Automatically Learning Compact Quality-aware Surrogates for Optimization Problems
Summary and Contributions: This paper presents a surrogate based end-to-end method to solve optimization problems where important decision parameters are unknown (and hence need to be inferred). In order to leverage the usual scalability and smoothness issues caused by usual "predict and optimize" approaches, the optimization problem is reparameterized by a learned surrogate that allows optimizing in a smaller dimensional space through a linear transformation. The authors provide some theoretical guarantees on the convexity and submodularity properties of the transformed optimization problem, on the partial pseudo-convexity of the re-parameterized learning problem and on the sample complexity of the predictive model learning task, thus arguing that such an approach should be in practice tractable and learnable with gradient descent (in spite of the non-convexity of the reparameterized learning problem). They also provide experiments carried out in three types of configurations, including convex, non-convex and submodular optimization problems and compare both the performance and scalability of the approach to (1) a two-stage approach that uses a separate ground truth-based model prior to optimization on the inferred parameters and (2) a classical end-to-end decision-focused method. The presented approach seems to significantly improve the performances when solving problems with numerous possible local optima (i.e.
Review for NeurIPS paper: Simple and Fast Algorithm for Binary Integer and Online Linear Programming
Weaknesses: - My primary concern is insufficient comparison with the existing literature on online LP, like the two works cited [Agrawal '14, Kesselheim et al. '14]: - The paper claims novelty in the sublihear competitive ratios obtained in those works of the form O(1 - \eps(m,n)), so that \eps(m,n) * OPT is the regret. From a glance at the works cited by [Agrawal '14], "Online stochastic packing applied to display ad allocation" [Feldman et al. '10] has an 1/OPT term in this competitive ratio, giving a sublinear regret bound. Some clarifying discussion is necessary here. Some discussion and a clearer comparison is necessary, since this line of work is so well-established. Some clarification about this would be appreciated; in any case, the manuscript should discuss this at greater depth.
Review for NeurIPS paper: Simple and Fast Algorithm for Binary Integer and Online Linear Programming
The analysis of the algorithm is conducted under two models: stochastic inputs (columns of LP drawn i.i.d.) and random permutation model (columns of LP revealed in a random order). The authors develop a simple and fast online algorithm performing stochastic subgradient descent on a dual problem with provable guarantees on the regret and constraint violation. The paper received borderline reviews with a slight lean towards acceptance. The main strengths of the paper are: - New techniques for deriving the regret bounds in the random permutation model (permutational Rademacher complexity). The main weaknesses were: - Insufficient comparison with the existing online LP literature, in particular with the competitive ratio bounds of previous work.
Reviews: An Inexact Augmented Lagrangian Framework for Nonconvex Optimization with Nonlinear Constraints
Applying ALM to the Burer-Monterio problem and to nonlinear programs in general is natural and well summarized in the monograph Ref [8]. Allowing first-order and second-order approximate solvers for the primal subproblems is also classic, and can be found in, e.g., Ch 8 & 9 of Ref [8]. I think the main novelties here lie at the nonsmooth, convex term g(x) and the convergence rate results. Sec 5 of the paper has provided a comprehensive review of pertinent results under different assumptions. I have several concerns that I hope the authors can address: * The BM example does not quite justify the inclusion of the possibly nonsmooth term g in (1). The authors may want to balance out and briefly discuss other examples as appearing in the experiments.
Reviews: An Inexact Augmented Lagrangian Framework for Nonconvex Optimization with Nonlinear Constraints
This paper uses the Augmented Lagrangian method to solve optimization problems for a sum of functions f and g, where f is nonconvex and g is convex but'proximal-friendly' subject to quite general nonlinear constraints. The proposed method solves the primal problem within some error epsilon_k that is gradually decreased as a penalty schedule beta_k is increasing across iterations. The approximate intermediate problems are solved using first order and second order solvers. The proposed analysis is technically non-trivial and interesting. The presentation of the paper was poor and at times confusing which made this a borderline paper.
Reviews: Shadowing Properties of Optimization Algorithms
The paper presents several "shadowing" results for gradient descent and the heavy ball method, under several conditions on the objective. In short, the authors provide conditions under which a discrete approximation of an ODE defines a trajectory that "stays close" to the actual trajectory of the ODE. This research is motivated by a by a recent paper by Su, Jordan, and Candes that models Nesterov's method via an ODE: this leads the authors to ask the question of when an ODE solution indeed well approximates a discrete algorithm, which is what would be implemented in practice. Although the interest and motivation is mostly on HB, the bulk of the results presented in the paper are for GD. The paper is well-written overall, and the results are interesting, if somewhat shallow.