AITopics | Optimization

Collaborating Authors

Optimization

News Overviews Instructional Materials AI-Alerts Classics

Regularized Nonlinear Acceleration

Scieur, Damien, d', Aspremont, Alexandre, Bach, Francis

Neural Information Processing SystemsFeb-14-2020, 06:28:02 GMT

We describe a convergence acceleration technique for generic optimization problems. Our scheme computes estimates of the optimum from a nonlinear average of the iterates produced by any optimization method. The weights in this average are computed via a simple and small linear system, whose solution can be updated online. This acceleration scheme runs in parallel to the base algorithm, providing improved estimates of the solution on the fly, while the original optimization method is running. Numerical experiments are detailed on classical classification problems.

optimization method, regularized nonlinear acceleration

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.62)

Add feedback

On the Global Linear Convergence of Frank-Wolfe Optimization Variants

Lacoste-Julien, Simon, Jaggi, Martin

Neural Information Processing SystemsFeb-14-2020, 06:26:37 GMT

The Frank-Wolfe (FW) optimization algorithm has lately re-gained popularity thanks in particular to its ability to nicely handle the structured constraints appearing in machine learning applications. However, its convergence rate is known to be slow (sublinear) when the solution lies at the boundary. A simple less-known fix is to add the possibility to take away steps' during optimization, an operation that importantly does not require a feasibility oracle. In this paper, we highlight and clarify several variants of the Frank-Wolfe optimization algorithm that has been successfully applied in practice: FW with away steps, pairwise FW, fully-corrective FW and Wolfe's minimum norm point algorithm, and prove for the first time that they all enjoy global linear convergence under a weaker condition than strong convexity. The constant in the convergence rate has an elegant interpretation as the product of the (classical) condition number of the function with a novel geometric quantity that plays the role of the condition number' of the constraint set.

algorithm, frank-wolfe optimization variant, global linear convergence, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

Add feedback

Optimizing affinity-based binary hashing using auxiliary coordinates

Raziperchikolaei, Ramin, Carreira-Perpinan, Miguel A.

Neural Information Processing SystemsFeb-14-2020, 06:25:59 GMT

In supervised binary hashing, one wants to learn a function that maps a high-dimensional feature vector to a vector of binary codes, for application to fast image retrieval. This typically results in a difficult optimization problem, nonconvex and nonsmooth, because of the discrete variables involved. Much work has simply relaxed the problem during training, solving a continuous optimization, and truncating the codes a posteriori. This gives reasonable results but is quite suboptimal. Recent work has tried to optimize the objective directly over the binary codes and achieved better results, but the hash function was still learned a posteriori, which remains suboptimal.

auxiliary coordinate, hash function, optimizing affinity-based binary, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.80)

Add feedback

Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees

Locatello, Francesco, Tschannen, Michael, Raetsch, Gunnar, Jaggi, Martin

Neural Information Processing SystemsFeb-14-2020, 06:12:33 GMT

Greedy optimization methods such as Matching Pursuit (MP) and Frank-Wolfe (FW) algorithms regained popularity in recent years due to their simplicity, effectiveness and theoretical guarantees. MP and FW address optimization over the linear span and the convex hull of a set of atoms, respectively. In this paper, we consider the intermediate case of optimization over the convex cone, parametrized as the conic hull of a generic atom set, leading to the first principled definitions of non-negative MP algorithms for which we give explicit convergence rates and demonstrate excellent empirical performance. In particular, we derive sublinear (O(1/t)) convergence on general smooth and convex objectives, and linear convergence (O(e {-t})) on strongly convex objectives, in both cases for general sets of atoms. Furthermore, we establish a clear correspondence of our algorithms to known algorithms from the MP and FW literature.

cone constrained optimization, convergence guarantee, greedy algorithm, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.75)

Add feedback

HONOR: Hybrid Optimization for NOn-convex Regularized problems

Gong, Pinghua, Ye, Jieping

Neural Information Processing SystemsFeb-14-2020, 05:58:28 GMT

Recent years have witnessed the superiority of non-convex sparse learning formulations over their convex counterparts in both theory and practice. However, due to the non-convexity and non-smoothness of the regularizer, how to efficiently solve the non-convex optimization problem for large-scale data is still quite challenging. Specifically, we develop a hybrid scheme which effectively integrates a Quasi-Newton (QN) step and a Gradient Descent (GD) step. Our contributions are as follows: (1) HONOR incorporates the second-order information to greatly speed up the convergence, while it avoids solving a regularized quadratic programming and only involves matrix-vector multiplications without explicitly forming the inverse Hessian matrix. Papers published at the Neural Information Processing Systems Conference.

hybrid optimization, non-convex regularized problem, underline, (4 more...)

Neural Information Processing Systems

Genre: Research Report (0.43)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.63)

Add feedback

Approximating Sparse PCA from Incomplete Data

KUNDU, ABHISEK, Drineas, Petros, Magdon-Ismail, Malik

Neural Information Processing SystemsFeb-14-2020, 05:56:48 GMT

We study how well one can recover sparse principal componentsof a data matrix using a sketch formed from a few of its elements. We show that for a wide class of optimization problems,if the sketch is close (in the spectral norm) to the original datamatrix, then one can recover a near optimal solution to the optimizationproblem by using the sketch. In particular, we use this approach toobtain sparse principal components and show that for \math{m} data pointsin \math{n} dimensions,\math{O(\epsilon {-2}\tilde k\max\{m,n\})} elements gives an\math{\epsilon}-additive approximation to the sparse PCA problem(\math{\tilde k} is the stable rank of the data matrix).We demonstrate our algorithms extensivelyon image, text, biological and financial data.The results show that not only are we able to recover the sparse PCAs from the incomplete data, but by using our sparse sketch, the running timedrops by a factor of five or more. Papers published at the Neural Information Processing Systems Conference.

approximating sparse pca, incomplete data, sketch, (2 more...)

Neural Information Processing Systems

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.65)

Add feedback

Multi-Task Learning as Multi-Objective Optimization

Sener, Ozan, Koltun, Vladlen

Neural Information Processing SystemsFeb-14-2020, 05:56:45 GMT

In multi-task learning, multiple tasks are solved jointly, sharing inductive bias between them. Multi-task learning is inherently a multi-objective problem because different tasks may conflict, necessitating a trade-off. A common compromise is to optimize a proxy objective that minimizes a weighted linear combination of per-task losses. However, this workaround is only valid when the tasks do not compete, which is rarely the case. To this end, we use algorithms developed in the gradient-based multi-objective optimization literature.

multi-objective optimization, multi-task learning, pareto optimal solution, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

Add feedback

Inference in Graphical Models via Semidefinite Programming Hierarchies

Erdogdu, Murat A., Deshpande, Yash, Montanari, Andrea

Neural Information Processing SystemsFeb-14-2020, 05:42:21 GMT

Popular inference algorithms such as belief propagation (BP) and generalized belief propagation (GBP) are intimately related to linear programming (LP) relaxation within the Sherali-Adams hierarchy. Despite the popularity of these algorithms, it is well understood that the Sum-of-Squares (SOS) hierarchy based on semidefinite programming (SDP) can provide superior guarantees. Unfortunately, SOS relaxations for a graph with $n$ vertices require solving an SDP with $n {\Theta(d)}$ variables where $d$ is the degree in the hierarchy. In practice, for $d\ge 4$, this approach does not scale beyond a few tens of variables. In this paper, we propose binary SDP relaxations for MAP inference using the SOS hierarchy with two innovations focused on computational efficiency.

graphical model, relaxation, semidefinite programming hierarchy, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.41)

Add feedback

Smooth and Strong: MAP Inference with Linear Convergence

Meshi, Ofer, Mahdavi, Mehrdad, Schwing, Alex

Neural Information Processing SystemsFeb-14-2020, 05:41:09 GMT

Maximum a-posteriori (MAP) inference is an important task for many applications. Although the standard formulation gives rise to a hard combinatorial optimization problem, several effective approximations have been proposed and studied in recent years. We focus on linear programming (LP) relaxations, which have achieved state-of-the-art performance in many applications. However, optimization of the resulting program is in general challenging due to non-smoothness and complex non-separable constraints.Therefore, in this work we study the benefits of augmenting the objective function of the relaxation with strong convexity. Specifically, we introduce strong convexity by adding a quadratic term to the LP relaxation objective. We provide theoretical guarantees for the resulting programs, bounding the difference between their optimal value and the original optimum.

linear convergence, map inference, strong convexity, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

Add feedback

Augmentative Message Passing for Traveling Salesman Problem and Graph Partitioning

Ravanbakhsh, Siamak, Rabbany, Reihaneh, Greiner, Russell

Neural Information Processing SystemsFeb-14-2020, 05:27:18 GMT

The cutting plane method is an augmentative constrained optimization procedure that is often used with continuous-domain optimization techniques such as linear and convex programs. We investigate the viability of a similar idea within message passing -- for integral solutions -- in the context of two combinatorial problems: 1) For Traveling Salesman Problem (TSP), we propose a factor-graph based on Held-Karp formulation, with an exponential number of constraint factors, each of which has an exponential but sparse tabular form. In both cases we are able to derive surprisingly simple message updates that lead to competitive solutions on benchmark instances. In particular for TSP we are able to find near-optimal solutions in the time that empirically grows with $N 3$, demonstrating that augmentation is practical and efficient. Papers published at the Neural Information Processing Systems Conference.

augmentative message passing, salesman problem and graph partitioning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback