AITopics | Optimization

Collaborating Authors

Optimization

News Overviews Instructional Materials AI-Alerts Classics

Convex Optimization Procedure for Clustering: Theoretical Revisit

Zhu, Changbo, Xu, Huan, Leng, Chenlei, Yan, Shuicheng

Neural Information Processing SystemsFeb-14-2020, 08:12:30 GMT

In this paper, we present theoretical analysis of SON -- a convex optimization procedure for clustering using a sum-of-norms (SON) regularization recently proposed in \cite{ICML2011Hocking_419,SON, Lindsten650707, pelckmans2005convex}. In particular, we show if the samples are drawn from two cubes, each being one cluster, then SON can provably identify the cluster membership provided that the distance between the two cubes is larger than a threshold which (linearly) depends on the size of the cube and the ratio of numbers of samples in each cluster. To the best of our knowledge, this paper is the first to provide a rigorous analysis to understand why and when SON works. We believe this may provide important insights to develop novel convex optimization based algorithms for clustering. Papers published at the Neural Information Processing Systems Conference.

clustering, convex optimization procedure, theoretical revisit, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

A New Alternating Direction Method for Linear Programming

Wang, Sinong, Shroff, Ness

Neural Information Processing SystemsFeb-14-2020, 07:59:22 GMT

It is well known that, for a linear program (LP) with constraint matrix $\mathbf{A}\in\mathbb{R} {m\times n}$, the Alternating Direction Method of Multiplier converges globally and linearly at a rate $O((\ \mathbf{A}\ _F 2 mn)\log(1/\epsilon))$. However, such a rate is related to the problem dimension and the algorithm exhibits a slow and fluctuating tail convergence'' in practice. In this paper, we propose a new variable splitting method of LP and prove that our method has a convergence rate of $O(\ \mathbf{A}\ 2\log(1/\epsilon))$. The proof is based on simultaneously estimating the distance from a pair of primal dual iterates to the optimal primal and dual solution set by certain residuals. In practice, we result in a new first-order LP solver that can exploit both the sparsity and the specific structure of matrix $\mathbf{A}$ and a significant speedup for important problems such as basis pursuit, inverse covariance matrix estimation, L1 SVM and nonnegative matrix factorization problem compared with current fastest LP solvers. Papers published at the Neural Information Processing Systems Conference.

alternating direction method, linear programming, new alternating direction method, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.40)

Add feedback

Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/\epsilon)

Xu, Yi, Yan, Yan, Lin, Qihang, Yang, Tianbao

Neural Information Processing SystemsFeb-14-2020, 07:58:35 GMT

In this paper, we develop a novel {\bf ho}moto{\bf p}y {\bf s}moothing (HOPS) algorithm for solving a family of non-smooth problems that is composed of a non-smooth term with an explicit max-structure and a smooth term or a simple non-smooth term whose proximal mapping is easy to compute. The best known iteration complexity for solving such non-smooth optimization problems is $O(1/\epsilon)$ without any assumption on the strong convexity. In this work, we will show that the proposed HOPS achieved a lower iteration complexity of $\tilde O(1/\epsilon {1-\theta})$ with $\theta\in(0,1]$ capturing the local sharpness of the objective function around the optimal solutions. To the best of our knowledge, this is the lowest iteration complexity achieved so far for the considered non-smooth optimization problems without strong convexity assumption. The HOPS algorithm employs Nesterov's smoothing technique and Nesterov's accelerated gradient method and runs in stages, which gradually decreases the smoothing parameter in a stage-wise manner until it yields a sufficiently good approximation of the original function.

homotopy smoothing, iteration complexity, non-smooth problem, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.84)

Add feedback

Bayesian Optimization with a Finite Budget: An Approximate Dynamic Programming Approach

Lam, Remi, Willcox, Karen, Wolpert, David H.

Neural Information Processing SystemsFeb-14-2020, 07:00:41 GMT

We consider the problem of optimizing an expensive objective function when a finite budget of total evaluations is prescribed. In that context, the optimal solution strategy for Bayesian optimization can be formulated as a dynamic programming instance. We show how to approximate the solution of this dynamic programming problem using rollout, and propose rollout heuristics specifically designed for the Bayesian optimization setting. We present numerical experiments showing that the resulting algorithm for optimization with a finite budget outperforms several popular Bayesian optimization algorithms. Papers published at the Neural Information Processing Systems Conference.

approximate dynamic programming approach, bayesian optimization, finite budget, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Predictive Entropy Search for Efficient Global Optimization of Black-box Functions

Hernández-Lobato, José Miguel, Hoffman, Matthew W., Ghahramani, Zoubin

Neural Information Processing SystemsFeb-14-2020, 07:00:27 GMT

We propose a novel information-theoretic approach for Bayesian optimization called Predictive Entropy Search (PES). At each iteration, PES selects the next evaluation point that maximizes the expected information gained with respect to the global maximum. PES codifies this intractable acquisition function in terms of the expected reduction in the differential entropy of the predictive distribution. This reformulation allows PES to obtain approximations that are both more accurate and efficient than other alternatives such as Entropy Search (ES). Furthermore, PES can easily perform a fully Bayesian treatment of the model hyperparameters while ES cannot.

black-box function, efficient global optimization, predictive entropy search, (1 more...)

Neural Information Processing Systems

Industry: Transportation > Air (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.56)

Add feedback

Integration Methods and Optimization Algorithms

Scieur, Damien, Roulet, Vincent, Bach, Francis, d', Aspremont, Alexandre

Neural Information Processing SystemsFeb-14-2020, 06:59:56 GMT

We show that accelerated optimization methods can be seen as particular instances of multi-step integration schemes from numerical analysis, applied to the gradient flow equation. Compared with recent advances in this vein, the differential equation considered here is the basic gradient flow, and we derive a class of multi-step schemes which includes accelerated algorithms, using classical conditions from numerical analysis. Multi-step schemes integrate the differential equation using larger step sizes, which intuitively explains the acceleration phenomenon. Papers published at the Neural Information Processing Systems Conference.

differential equation, integration method and optimization algorithm, numerical analysis

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.79)

Add feedback

Learning convolution filters for inverse covariance estimation of neural network connectivity

Mohler, George

Neural Information Processing SystemsFeb-14-2020, 06:59:52 GMT

We consider the problem of inferring direct neural network connections from Calcium imaging time series. Inverse covariance estimation has proven to be a fast and accurate method for learning macro- and micro-scale network connectivity in the brain and in a recent Kaggle Connectomics competition inverse covariance was the main component of several top ten solutions, including our own and the winning team's algorithm. However, the accuracy of inverse covariance estimation is highly sensitive to signal preprocessing of the Calcium fluorescence time series. Furthermore, brute force optimization methods such as grid search and coordinate ascent over signal processing parameters is a time intensive process, where learning may take several days and parameters that optimize one network may not generalize to networks with different size and parameters. In this paper we show how inverse covariance estimation can be dramatically improved using a simple convolution filter prior to applying sample covariance. Furthermore, these signal processing parameters can be learned quickly using a supervised optimization algorithm.

convolution filter, inverse covariance estimation, network connectivity, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Communications > Networks (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.61)

Add feedback

Stochastic Network Design in Bidirected Trees

wu, xiaojian, Sheldon, Daniel R., Zilberstein, Shlomo

Neural Information Processing SystemsFeb-14-2020, 06:59:42 GMT

We investigate the problem of stochastic network design in bidirected trees. In this problem, an underlying phenomenon (e.g., a behavior, rumor, or disease) starts at multiple sources in a tree and spreads in both directions along its edges. Actions can be taken to increase the probability of propagation on edges, and the goal is to maximize the total amount of spread away from all sources. Our main result is a rounded dynamic programming approach that leads to a fully polynomial-time approximation scheme (FPTAS), that is, an algorithm that can find (1 ε)-optimal solutions for any problem instance in time polynomial in the input size and 1/ε. Our algorithm outperforms competing approaches on a motivating problem from computational sustainability to remove barriers in river networks to restore the health of aquatic ecosystems.

bidirected tree, stochastic network design

Neural Information Processing Systems

Technology:

Information Technology > Communications > Networks (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Differentiable Learning of Submodular Models

Djolonga, Josip, Krause, Andreas

Neural Information Processing SystemsFeb-14-2020, 06:43:03 GMT

Can we incorporate discrete optimization algorithms within modern machine learning models? For example, is it possible to use in deep architectures a layer whose output is the minimal cut of a parametrized graph? Given that these models are trained end-to-end by leveraging gradient information, the introduction of such layers seems very challenging due to their non-continuous output. In this paper we focus on the problem of submodular minimization, for which we show that such layers are indeed possible. The key idea is that we can continuously relax the output without sacrificing guarantees.

differentiable learning, submodular model

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

A Nonconvex Optimization Framework for Low Rank Matrix Estimation

Zhao, Tuo, Wang, Zhaoran, Liu, Han

Neural Information Processing SystemsFeb-14-2020, 06:41:56 GMT

We study the estimation of low rank matrices via nonconvex optimization. Compared with convex relaxation, nonconvex optimization exhibits superior empirical performance for large scale instances of low rank matrix estimation. However, the understanding of its theoretical guarantees are limited. In this paper, we define the notion of projected oracle divergence based on which we establish sufficient conditions for the success of nonconvex optimization. We illustrate the consequences of this general framework for matrix sensing and completion. In particular, we prove that a broad class of nonconvex optimization algorithms, including alternating minimization and gradient-type methods, geometrically converge to the global optimum and exactly recover the true low rank matrices under standard conditions.

low rank matrix estimation, nonconvex optimization framework

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.77)
Information Technology > Artificial Intelligence > Machine Learning (0.56)

Add feedback