AITopics | Optimization

It has recently been observed that certain extremely simple feature encoding techniques are able to achieve state of the art performance on several standard image classification benchmarks including deep belief networks, convolutional nets, factored RBMs, mcRBMs, convolutional RBMs, sparse autoencoders and several others. Moreover, these "triangle" or "soft threshold" encodings are ex- tremely efficient to compute. Several intuitive arguments have been put forward to explain this remarkable performance, yet no mathematical justification has been offered. The main result of this report is to show that these features are realized as an approximate solution to the a non-negative sparse coding problem. Using this connection we describe several variants of the soft threshold features and demonstrate their effectiveness on two image classification benchmark tasks.

artificial intelligence, machine learning, sparse, (17 more...)

arXiv.org Machine Learning

1208.0959

Country: North America (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Greedy Sparsity-Constrained Optimization

Bahmani, Sohail, Raj, Bhiksha, Boufounos, Petros

arXiv.org Machine LearningJan-6-2013

Sparsity-constrained optimization has wide applicability in machine learning, statistics, and signal processing problems such as feature selection and compressive Sensing. A vast body of work has studied the sparsity-constrained optimization from theoretical, algorithmic, and application aspects in the context of sparse estimation in linear models where the fidelity of the estimate is measured by the squared error. In contrast, relatively less effort has been made in the study of sparsity-constrained optimization in cases where nonlinear models are involved or the cost function is not quadratic. In this paper we propose a greedy algorithm, Gradient Support Pursuit (GraSP), to approximate sparse minima of cost functions of arbitrary form. Should a cost function have a Stable Restricted Hessian (SRH) or a Stable Restricted Linearization (SRL), both of which are introduced in this paper, our algorithm is guaranteed to produce a sparse vector within a bounded distance from the true sparse optimum. Our approach generalizes known results for quadratic cost functions that arise in sparse linear regression and Compressive Sensing. We also evaluate the performance of GraSP through numerical simulations on synthetic data, where the algorithm is employed for sparse logistic regression with and without $\ell_2$-regularization.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1203.5483

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Risk Aversion in Markov Decision Processes via Near Optimal Chernoff Bounds

Moldovan, Teodor M., Abbeel, Pieter

Neural Information Processing SystemsDec-31-2012

The expected return is a widely used objective in decision making under uncertainty. Many algorithms, such as value iteration, have been proposed to optimize it. In risk-aware settings, however, the expected return is often not an appropriate objective to optimize. We propose a new optimization objective for risk-aware planning and show that it has desirable theoretical properties. We also draw connections to previously proposed objectives for risk-aware planing: minmax, exponential utility, percentile and mean minus variance. Our method applies to an extended class of Markov decision processes: we allow costs to be stochastic as long as they are bounded. Additionally, we present an efficient algorithm for optimizing the proposed objective. Synthetic and real-world experiments illustrate the effectiveness of our method, at scale.

algorithm, objective, optimization, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana > Caddo Parish > Shreveport (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(3 more...)

Industry:

Transportation > Passenger (1.00)
Transportation > Air (1.00)
Consumer Products & Services > Travel (1.00)
Transportation > Infrastructure & Services > Airport (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.62)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

Submodular-Bregman and the Lovász-Bregman Divergences with Applications

Iyer, Rishabh, Bilmes, Jeff A.

Neural Information Processing SystemsDec-31-2012

We introduce a class of discrete divergences on sets (equivalently binary vectors) that we call the submodular-Bregman divergences. We consider two kinds, defined either from tight modular upper or tight modular lower bounds of a submodular function. We show that the properties of these divergences are analogous to the (standard continuous) Bregman divergence. We demonstrate how they generalize many useful divergences, including the weighted Hamming distance, squared weighted Hamming, weighted precision, recall, conditional mutual information, and a generalized KL-divergence on sets. We also show that the generalized Bregman divergence on the Lovász extension of a submodular function, which we call the Lovász-Bregman divergence, is a continuous extension of a submodular Bregman divergence. We point out a number of applications, and in particular show that a proximal algorithm defined through the submodular Bregman divergence provides a framework for many mirror-descent style algorithms related to submodular function optimization. We also show that a generalization of the k-means algorithm using the Lovász Bregman divergence is natural in clustering scenarios where ordering is important. A unique property of this algorithm is that computing the mean ordering is extremely efficient unlike other order based distance measures.

bregman divergence, divergence, submodular bregman, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Colorado > El Paso County > Colorado Springs (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback

Proximal Newton-type methods for convex optimization

Lee, Jason D., Sun, Yuekai, Saunders, Michael

Neural Information Processing SystemsDec-31-2012

R is a convex but not necessarily differentiable function whose proximal mapping can be evaluated efficiently. We derive a generalization of Newton-type methods to handle such convex but nonsmooth objective functions. We prove such methods are globally convergent and achieve superlinear rates of convergence in the vicinity of an optimal solution. We also demonstrate the performance of these methods using problems of relevance in machine learning and statistics.

approximation, newton-type method, subproblem, (14 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > United States > California > Santa Clara County > Stanford (0.04)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

Risk Aversion in Markov Decision Processes via Near Optimal Chernoff Bounds

Moldovan, Teodor M., Abbeel, Pieter

Neural Information Processing SystemsDec-31-2012

The expected return is a widely used objective in decision making under uncertainty. Many algorithms, such as value iteration, have been proposed to optimize it. In risk-aware settings, however, the expected return is often not an appropriate objective to optimize. We propose a new optimization objective for risk-aware planning and show that it has desirable theoretical properties. We also draw connections to previously proposed objectives for risk-aware planing: minmax, exponential utility, percentile and mean minus variance. Our method applies to an extended class of Markov decision processes: we allow costs to be stochastic as long as they are bounded. Additionally, we present an efficient algorithm for optimizing the proposed objective. Synthetic and real-world experiments illustrate the effectiveness of our method, at scale.

algorithm, objective, optimization, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana > Caddo Parish > Shreveport (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(3 more...)

Industry:

Transportation > Passenger (1.00)
Transportation > Air (1.00)
Consumer Products & Services > Travel (1.00)
Transportation > Infrastructure & Services > Airport (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.62)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

Submodular-Bregman and the Lovász-Bregman Divergences with Applications

Iyer, Rishabh, Bilmes, Jeff A.

Neural Information Processing SystemsDec-31-2012

We introduce a class of discrete divergences on sets (equivalently binary vectors) that we call the submodular-Bregman divergences. We consider two kinds, defined either from tight modular upper or tight modular lower bounds of a submodular function. We show that the properties of these divergences are analogous to the (standard continuous) Bregman divergence. We demonstrate how they generalize many useful divergences, including the weighted Hamming distance, squared weighted Hamming, weighted precision, recall, conditional mutual information, and a generalized KL-divergence on sets. We also show that the generalized Bregman divergence on the Lovász extension of a submodular function, which we call the Lovász-Bregman divergence, is a continuous extension of a submodular Bregman divergence. We point out a number of applications, and in particular show that a proximal algorithm defined through the submodular Bregman divergence provides a framework for many mirror-descent style algorithms related to submodular function optimization. We also show that a generalization of the k-means algorithm using the Lovász Bregman divergence is natural in clustering scenarios where ordering is important. A unique property of this algorithm is that computing the mean ordering is extremely efficient unlike other order based distance measures.

bregman divergence, divergence, submodular bregman, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Colorado > El Paso County > Colorado Springs (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback

Proximal Newton-type methods for convex optimization

Lee, Jason D., Sun, Yuekai, Saunders, Michael

Neural Information Processing SystemsDec-31-2012

R is a convex but not necessarily differentiable function whose proximal mapping can be evaluated efficiently. We derive a generalization of Newton-type methods to handle such convex but nonsmooth objective functions. We prove such methods are globally convergent and achieve superlinear rates of convergence in the vicinity of an optimal solution. We also demonstrate the performance of these methods using problems of relevance in machine learning and statistics.

approximation, newton-type method, subproblem, (14 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > United States > California > Santa Clara County > Stanford (0.04)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

Approximating Concavely Parameterized Optimization Problems

Giesen, Joachim, Mueller, Jens, Laue, Soeren, Swiercy, Sascha

Neural Information Processing SystemsDec-31-2012

We consider an abstract class of optimization problems that are parameterized concavely in a single parameter, and show that the solution path along the parameter can always be approximated with accuracy $\varepsilon >0$ by a set of size $O(1/\sqrt{\varepsilon})$. A lower bound of size $\Omega (1/\sqrt{\varepsilon})$ shows that the upper bound is tight up to a constant factor. We also devise an algorithm that calls a step-size oracle and computes an approximate path of size $O(1/\sqrt{\varepsilon})$. Finally, we provide an implementation of the oracle for soft-margin support vector machines, and a parameterized semi-definite program for matrix completion.

algorithm, complexity, monotonously, (13 more...)

Neural Information Processing Systems

Country:

Europe > Germany (0.05)
North America > United States > New York (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.70)

Add feedback

Multi-Stage Multi-Task Feature Learning

Gong, Pinghua, Ye, Jieping, Zhang, Chang-shui

Neural Information Processing SystemsDec-31-2012

Multi-task sparse feature learning aims to improve the generalization performance by exploiting the shared features among tasks. It has been successfully applied to many applications including computer vision and biomedical informatics. Most of the existing multi-task sparse feature learning algorithms are formulated as a convex sparse regularization problem, which is usually suboptimal, due to its looseness for approximating an $\ell_0$-type regularizer. In this paper, we propose a non-convex formulation for multi-task sparse feature learning based on a novel regularizer. To solve the non-convex optimization problem, we propose a Multi-Stage Multi-Task Feature Learning (MSMTFL) algorithm. Moreover, we present a detailed theoretical analysis showing that MSMTFL achieves a better parameter estimation error bound than the convex formulation. Empirical studies on both synthetic and real-world data sets demonstrate the effectiveness of MSMTFL in comparison with the state of the art multi-task sparse feature learning algorithms.

algorithm, estimation error, parameter estimation error, (12 more...)

Neural Information Processing Systems

Country: