Goto

Collaborating Authors

 Optimization


Finding Optimal Longest Paths by Dynamic Programming in Parallel

AAAI Conferences

We propose an exact algorithm for solving the longest path problem between two given vertices in undirected weighted graphs. By using graph partitioning and dynamic programming, we obtain an algorithm that is significantly faster than other state-of-the-art methods. This enables us to solve instances that were previously unsolved and solve hard instances significantly faster. We also present a parallel version of the algorithm.


Learning and Utilizing Interaction Patterns among Neighborhood-Based Heuristics

AAAI Conferences

This paper proposes a method for learning and utilizing potentially useful interaction patterns among neighborhood-based heuristics. It is built upon a previously proposed framework designed for facilitating the task of combining multiple neighborhood-based heuristics. Basically, an algorithm derived from this framework will operate by chaining the heuristics in a pipelined fashion. Conceptually, we can view this framework as an algorithmic template that contains two user-defined components: 1) the policy H for selecting heuristics, and 2) the policy L for choosing the length of the pipeline that chains the selected heuristics. In this paper, we will develop a method that automatically derives a policy H by analyzing the experience collected from running a baseline algorithm. This analysis will distill potentially useful patterns of interactions among heuristics, and give an estimate for the frequency of using each pattern. The empirical results on three problem domains show the effectiveness of the proposed approach.


Safe Policy Improvement with Soft Baseline Bootstrapping

arXiv.org Artificial Intelligence

Batch Reinforcement Learning (Batch RL) consists in training a policy using trajectories collected with another policy, called the behavioural policy. Safe policy improvement (SPI) provides guarantees with high probability that the trained policy performs better than the behavioural policy, also called baseline in this setting. Previous work shows that the SPI objective improves mean performance as compared to using the basic RL objective, which boils down to solving the MDP with maximum likelihood. Here, we build on that work and improve more precisely the SPI with Baseline Bootstrapping algorithm (SPIBB) by allowing the policy search over a wider set of policies. Instead of binarily classifying the state-action pairs into two sets (the \textit{uncertain} and the \textit{safe-to-train-on} ones), we adopt a softer strategy that controls the error in the value estimates by constraining the policy change according to the local model uncertainty. The method can take more risks on uncertain actions all the while remaining provably-safe, and is therefore less conservative than the state-of-the-art methods. We propose two algorithms (one optimal and one approximate) to solve this constrained optimization problem and empirically show a significant improvement over existing SPI algorithms both on finite MDPs and on infinite MDPs with a neural network function approximation.


The Landscape of Non-convex Empirical Risk with Degenerate Population Risk

arXiv.org Machine Learning

The landscape of empirical risk has been widely studied in a series of machine learning problems, including low-rank matrix factorization, matrix sensing, matrix completion, and phase retrieval. In this work, we focus on the situation where the corresponding population risk is a degenerate non-convex loss function, namely, the Hessian of the population risk can have zero eigenvalues. Instead of analyzing the non-convex empirical risk directly, we first study the landscape of the corresponding population risk, which is usually easier to characterize, and then build a connection between the landscape of the empirical risk and its population risk. In particular, we establish a correspondence between the critical points of the empirical risk and its population risk without the strongly Morse assumption, which is required in existing literature but not satisfied in degenerate scenarios. We also apply the theory to matrix sensing and phase retrieval to demonstrate how to infer the landscape of empirical risk from that of the corresponding population risk.


Quantum and Classical Algorithms for Approximate Submodular Function Minimization

arXiv.org Machine Learning

Submodular functions are set functions mapping every subset of some ground set of size $n$ into the real numbers and satisfying the diminishing returns property. Submodular minimization is an important field in discrete optimization theory due to its relevance for various branches of mathematics, computer science and economics. The currently fastest strongly polynomial algorithm for exact minimization [LSW15] runs in time $\widetilde{O}(n^3 \cdot \mathrm{EO} + n^4)$ where $\mathrm{EO}$ denotes the cost to evaluate the function on any set. For functions with range $[-1,1]$, the best $\epsilon$-additive approximation algorithm [CLSW17] runs in time $\widetilde{O}(n^{5/3}/\epsilon^{2} \cdot \mathrm{EO})$. In this paper we present a classical and a quantum algorithm for approximate submodular minimization. Our classical result improves on the algorithm of [CLSW17] and runs in time $\widetilde{O}(n^{3/2}/\epsilon^2 \cdot \mathrm{EO})$. Our quantum algorithm is, up to our knowledge, the first attempt to use quantum computing for submodular optimization. The algorithm runs in time $\widetilde{O}(n^{5/4}/\epsilon^{5/2} \cdot \log(1/\epsilon) \cdot \mathrm{EO})$. The main ingredient of the quantum result is a new method for sampling with high probability $T$ independent elements from any discrete probability distribution of support size $n$ in time $O(\sqrt{Tn})$. Previous quantum algorithms for this problem were of complexity $O(T\sqrt{n})$.


An Intrinsically-Motivated Approach for Learning Highly Exploring and Fast Mixing Policies

arXiv.org Machine Learning

What is a good exploration strategy for an agent that interacts with an environment in the absence of external rewards? Ideally, we would like to get a policy driving towards a uniform state-action visitation (highly exploring) in a minimum number of steps (fast mixing), in order to ease efficient learning of any goal-conditioned policy later on. Unfortunately, it is remarkably arduous to directly learn an optimal policy of this nature. In this paper, we propose a novel surrogate objective for learning highly exploring and fast mixing policies, which focuses on maximizing a lower bound to the entropy of the steady-state distribution induced by the policy. In particular, we introduce three novel lower bounds, that lead to as many optimization problems, that tradeoff the theoretical guarantees with computational complexity. Then, we present a model-based reinforcement learning algorithm, IDE$^{3}$AL, to learn an optimal policy according to the introduced objective. Finally, we provide an empirical evaluation of this algorithm on a set of hard-exploration tasks.


Trust-Region Variational Inference with Gaussian Mixture Models

arXiv.org Machine Learning

Many methods for machine learning rely on approximate inference from intractable probability distributions. Variational inference approximates such distributions by tractable models that can be subsequently used for approximate inference. Learning sufficiently accurate approximations requires a rich model family and careful exploration of the relevant modes of the target distribution. We propose a method for learning accurate GMM approximations of intractable probability distributions based on insights from policy search by establishing information-geometric trust regions for principled exploration. For efficient improvement of the GMM approximation, we derive a lower bound on the corresponding optimization objective enabling us to update the components independently. The use of the lower bound ensures convergence to a local optimum of the original objective. The number of components is adapted online by adding new components in promising regions and by deleting components with negligible weight. We demonstrate on several domains that we can learn approximations of complex, multi-modal distributions with a quality that is unmet by previous variational inference methods, and that the GMM approximation can be used for drawing samples that are on par with samples created by state-of-the-art MCMC samplers while requiring up to three orders of magnitude less computational resources.


Two-block vs. Multi-block ADMM: An empirical evaluation of convergence

arXiv.org Machine Learning

Alternating Direction Method of Multipliers (ADMM) has become a widely used optimization method for convex problems, particularly in the context of data mining in which large optimization problems are often encountered. ADMM has several desirable properties, including the ability to decompose large problems into smaller tractable sub-problems and ease of parallelization, that are essential in these scenarios. The most common form of ADMM is the two-block, in which two sets of primal variables are updated alternatingly. Recent years have seen advances in multi-block ADMM, which update more than two blocks of primal variables sequentially. In this paper, we study the empirical question: {\em Is two-block ADMM always comparable with sequential multi-block ADMM solving an equivalent problem?} In the context of optimization problems arising in multi-task learning, through a comprehensive set of experiments we surprisingly show that multi-block ADMM consistently outperformed two-block ADMM on optimization performance, and as a consequence on prediction performance, across all datasets and for the entire range of dual step sizes. Our results have an important practical implication: rather than simply using the popular two-block ADMM, one may considerably benefit from experimenting with multi-block ADMM applied to an equivalent problem.


SNAP: Finding Approximate Second-Order Stationary Solutions Efficiently for Non-convex Linearly Constrained Problems

arXiv.org Machine Learning

This paper proposes low-complexity algorithms for finding approximate second-order stationary points (SOSPs) of problems with smooth non-convex objective and linear constraints. While finding (approximate) SOSPs is computationally intractable, we first show that generic instances of the problem can be solved efficiently. More specifically, for a generic problem instance, certain strict complementarity (SC) condition holds for all Karush-Kuhn-Tucker (KKT) solutions (with probability one). The SC condition is then used to establish an equivalence relationship between two different notions of SOSPs, one of which is computationally easy to verify. Based on this particular notion of SOSP, we design an algorithm named the Successive Negative-curvature grAdient Projection (SNAP), which successively performs either conventional gradient projection or some negative curvature based projection steps to find SOSPs. SNAP and its first-order extension SNAP$^+$, require $\mathcal{O}(1/\epsilon^{2.5})$ iterations to compute an $(\epsilon, \sqrt{\epsilon})$-SOSP, and their per-iteration computational complexities are polynomial in the number of constraints and problem dimension. To our knowledge, this is the first time that first-order algorithms with polynomial per-iteration complexity and global sublinear rate have been designed to find SOSPs of the important class of non-convex problems with linear constraints.


Global Optimality Guarantees for Nonconvex Unsupervised Video Segmentation

arXiv.org Machine Learning

In this paper, we consider the problem of unsupervised video object segmentation via background subtraction. Specifically, we pose the nonsemantic extraction of a video's moving objects as a nonconvex optimization problem via a sum of sparse and low-rank matrices. The resulting formulation, a nonnegative variant of robust principal component analysis, is more computationally tractable than its commonly employed convex relaxation, although not generally solvable to global optimality. In spite of this limitation, we derive intuitive and interpretable conditions on the video data under which the uniqueness and global optimality of the object segmentation are guaranteed using local search methods. We illustrate these novel optimality criteria through example segmentations using real video data.