Optimization
Response to "Counterexample to global convergence of DSOS and SDSOS hierarchies"
Ahmadi, Amir Ali, Majumdar, Anirudha
In a recent note [8], the author provides a counterexample to the global convergence of what his work refers to as "the DSOS and SDSOS hierarchies" for polynomial optimization problems (POPs) and purports that this refutes claims in our extended abstract [4] and slides in [3]. The goal of this paper is to clarify that neither [4], nor [3], and certainly not our full paper [5], ever defined DSOS or SDSOS hierarchies as it is done in [8]. It goes without saying that no claims about convergence properties of the hierarchies in [8] were ever made as a consequence. What was stated in [4,3] was completely different: we stated that there exist hierarchies based on DSOS and SDSOS optimization that converge. This is indeed true as we discuss in this response. We also emphasize that we were well aware that some (S)DSOS hierarchies do not converge even if their natural SOS counterparts do. This is readily implied by an example in our prior work [5], which makes the counterexample in [8] superfluous. Finally, we provide concrete counterarguments to claims made in [8] that aim to challenge the scalability improvements obtained by DSOS and SDSOS optimization as compared to sum of squares (SOS) optimization. [3] A. A. Ahmadi and A. Majumdar. DSOS and SDSOS: More tractable alternatives to SOS. Slides at the meeting on Geometry and Algebra of Linear Matrix Inequalities, CIRM, Marseille, 2013. [4] A. A. Ahmadi and A. Majumdar. DSOS and SDSOS optimization: LP and SOCP-based alternatives to sum of squares optimization. In proceedings of the 48th annual IEEE Conference on Information Sciences and Systems, 2014. [5] A. A. Ahmadi and A. Majumdar. DSOS and SDSOS optimization: more tractable alternatives to sum of squares and semidefinite optimization. arXiv:1706.02586, 2017. [8] C. Josz. Counterexample to global convergence of DSOS and SDSOS hierarchies. arXiv:1707.02964, 2017.
Memetic search for identifying critical nodes in sparse graphs
Zhou, Yangming, Hao, Jin-Kao, Glover, Fred
Critical node problems involve identifying a subset of critical nodes from an undirected graph whose removal results in optimizing a pre-defined measure over the residual graph. As useful models for a variety of practical applications, these problems are computational challenging. In this paper, we study the classic critical node problem (CNP) and introduce an effective memetic algorithm for solving CNP. The proposed algorithm combines a double backbone-based crossover operator (to generate promising offspring solutions), a component-based neighborhood search procedure (to find high-quality local optima) and a rank-based pool updating strategy (to guarantee a healthy population). Specially, the component-based neighborhood search integrates two key techniques, i.e., two-phase node exchange strategy and node weighting scheme. The double backbone-based crossover extends the idea of general backbone-based crossovers. Extensive evaluations on 42 synthetic and real-world benchmark instances show that the proposed algorithm discovers 21 new upper bounds and matches 18 previous best-known upper bounds. We also demonstrate the relevance of our algorithm for effectively solving a variant of the classic CNP, called the cardinality-constrained critical node problem. Finally, we investigate the usefulness of each key algorithmic component.
Primal-Dual Optimization Algorithms over Riemannian Manifolds: an Iteration Complexity Analysis
Zhang, Junyu, Ma, Shiqian, Zhang, Shuzhong
In this paper we study nonconvex and nonsmooth multi-block optimization over Riemannian manifolds with coupled linear constraints. Such optimization problems naturally arise from machine learning, statistical learning, compressive sensing, image processing, and tensor PCA, among others. We develop an ADMM-like primal-dual approach based on decoupled solvable subroutines such as linearized proximal mappings. First, we introduce the optimality conditions for the afore-mentioned optimization models. Then, the notion of $\epsilon$-stationary solutions is introduced as a result. The main part of the paper is to show that the proposed algorithms enjoy an iteration complexity of $O(1/\epsilon^2)$ to reach an $\epsilon$-stationary solution. For prohibitively large-size tensor or machine learning models, we present a sampling-based stochastic algorithm with the same iteration complexity bound in expectation. In case the subproblems are not analytically solvable, a feasible curvilinear line-search variant of the algorithm based on retraction operators is proposed. Finally, we show specifically how the algorithms can be implemented to solve a variety of practical problems such as the NP-hard maximum bisection problem, the $\ell_q$ regularized sparse tensor principal component analysis and the community detection problem. Our preliminary numerical results show great potentials of the proposed methods.
Learning RBM with a DC programming Approach
Upadhya, Vidyadhar, Sastry, P. S.
By exploiting the property that the RBM log-likelihood function is the difference of convex functions, we formulate a stochastic variant of the difference of convex functions (DC) programming to minimize the negative log-likelihood. Interestingly, the traditional contrastive divergence algorithm is a special case of the above formulation and the hyperparameters of the two algorithms can be chosen such that the amount of computation per mini-batch is identical. We show that for a given computational budget the proposed algorithm almost always reaches a higher log-likelihood more rapidly, compared to the standard contrastive divergence algorithm. Further, we modify this algorithm to use the centered gradients and show that it is more efficient and effective compared to the standard centered gradient algorithm on benchmark datasets.
The Linear Programming Approach to Reach-Avoid Problems for Markov Decision Processes
Kariotoglou, Nikolaos, Kamgarpour, Maryam, Summers, Tyler H., Lygeros, John
One of the most fundamental problems in Markov decision processes is analysis and control synthesis for safety and reachability specifications. We consider the stochastic reach-avoid problem, in which the objective is to synthesize a control policy to maximize the probability of reaching a target set at a given time, while staying in a safe set at all prior times. We characterize the solution to this problem through an infinite dimensional linear program. We then develop a tractable approximation to the infinite dimensional linear program through finite dimensional approximations of the decision space and constraints. For a large class of Markov decision processes modeled by Gaussian mixtures kernels we show that through a proper selection of the finite dimensional space, one can further reduce the computational complexity of the resulting linear program. We validate the proposed method and analyze its potential with numerical case studies.
Steps Toward Robust Artificial Intelligence
Recent advances in artificial intelligence are encouraging governments and corporations to deploy AI in high-stakes settings including driving cars autonomously, managing the power grid, trading on stock exchanges, and controlling autonomous weapons systems. Such applications require AI methods to be robust to both the known unknowns (those uncertain aspects of the world about which the computer can reason explicitly) and the unknown unknowns (those aspects of the world that are not captured by the systemโs models). This article discusses recent progress in AI and then describes eight ideas related to robustness that are being pursued within the AI research community. While these ideas are a start, we need to devote more attention to the challenges of dealing with the known and unknown unknowns. These issues are fascinating, because they touch on the fundamental question of how finite systems can survive and thrive in a complex and dangerous world
How is Distributed ADMM Affected by Network Topology?
Franรงa, Guilherme, Bento, Josรฉ
When solving consensus optimization problems over a graph, there is often an explicit characterization of the convergence rate of Gradient Descent (GD) using the spectrum of the graph Laplacian. The same type of problems under the Alternating Direction Method of Multipliers (ADMM) are, however, poorly understood. For instance, simple but important non-strongly-convex consensus problems have not yet being analyzed, especially concerning the dependency of the convergence rate on the graph topology. Recently, for a non-strongly-convex consensus problem, a connection between distributed ADMM and lifted Markov chains was proposed, followed by a conjecture that ADMM is faster than GD by a square root factor in its convergence time, in close analogy to the mixing speedup achieved by lifting several Markov chains. Nevertheless, a proof of such a claim is is still lacking. Here we provide a full characterization of the convergence of distributed over-relaxed ADMM for the same type of consensus problem in terms of the topology of the underlying graph. Our results provide explicit formulas for optimal parameter selection in terms of the second largest eigenvalue of the transition matrix of the graph's random walk. Another consequence of our results is a proof of the aforementioned conjecture, which interestingly, we show it is valid for any graph, even the ones whose random walks cannot be accelerated via Markov chain lifting.
On Noisy Negative Curvature Descent: Competing with Gradient Descent for Faster Non-convex Optimization
The Hessian-vector product has been utilized to find a second-order stationary solution with strong complexity guarantee (e.g., almost linear time complexity in the problem's dimensionality). In this paper, we propose to further reduce the number of Hessian-vector products for faster non-convex optimization. Previous algorithms need to approximate the smallest eigen-value with a sufficient precision (e.g., $\epsilon_2\ll 1$) in order to achieve a sufficiently accurate second-order stationary solution (i.e., $\lambda_{\min}(\nabla^2 f(\x))\geq -\epsilon_2)$. In contrast, the proposed algorithms only need to compute the smallest eigen-vector approximating the corresponding eigen-value up to a small power of current gradient's norm. As a result, it can dramatically reduce the number of Hessian-vector products during the course of optimization before reaching first-order stationary points (e.g., saddle points). The key building block of the proposed algorithms is a novel updating step named the NCG step, which lets a noisy negative curvature descent compete with the gradient descent. We show that the worst-case time complexity of the proposed algorithms with their favorable prescribed accuracy requirements can match the best in literature for achieving a second-order stationary point but with an arguably smaller per-iteration cost. We also show that the proposed algorithms can benefit from inexact Hessian by developing their variants accepting inexact Hessian under a mild condition for achieving the same goal. Moreover, we develop a stochastic algorithm for a finite or infinite sum non-convex optimization problem. To the best of our knowledge, the proposed stochastic algorithm is the first one that converges to a second-order stationary point in {\it high probability} with a time complexity independent of the sample size and almost linear in dimensionality.
Large-Scale Quadratically Constrained Quadratic Program via Low-Discrepancy Sequences
Basu, Kinjal, Saha, Ankan, Chatterjee, Shaunak
We consider the problem of solving a large-scale Quadratically Constrained Quadratic Program. Such problems occur naturally in many scientific and web applications. Although there are efficient methods which tackle this problem, they are mostly not scalable. In this paper, we develop a method that transforms the quadratic constraint into a linear form by sampling a set of low-discrepancy points. The transformed problem can then be solved by applying any state-of-the-art large-scale quadratic programming solvers. We show the convergence of our approximate solution to the true solution as well as some finite sample error bounds. Experimental results are also shown to prove scalability as well as improved quality of approximation in practice.
Dynamic Assortment Personalization in High Dimensions
Kallus, Nathan, Udell, Madeleine
We study the problem of dynamic assortment personalization with large, heterogeneous populations and wide arrays of products, and demonstrate the importance of structural priors for effective, efficient large-scale personalization. Assortment personalization is the problem of choosing, for each individual or consumer segment (type), a best assortment of products, ads, or other offerings (items) so as to maximize revenue. This problem is central to revenue management in e-commerce, online advertising, and multi-location brick-and-mortar retail, where both items and types can number in the millions. We formulate the dynamic assortment personalization problem as a discrete-contextual bandit with $m$ contexts (customer types) and exponentially many arms (assortments of the $n$ items). We assume that each type's preferences follow a simple parametric model with $n$ parameters. In all, there are $mn$ parameters, and existing literature suggests that order optimal regret scales as $mn$. However, the data required to estimate so many parameters is orders of magnitude larger than the data available in most revenue management applications; and the optimal regret under these models is unacceptably high. In this paper, we impose a natural structure on the problem -- a small latent dimension, or low rank. In the static setting, we show that this model can be efficiently learned from surprisingly few interactions, using a time- and memory-efficient optimization algorithm that converges globally whenever the model is learnable. In the dynamic setting, we show that structure-aware dynamic assortment personalization can have regret that is an order of magnitude smaller than structure-ignorant approaches. We validate our theoretical results empirically.