AITopics | Optimization

Collaborating Authors

Optimization

News Overviews Instructional Materials AI-Alerts Classics

Alternating Direction Methods for Latent Variable Gaussian Graphical Model Selection

arXiv.org Machine LearningSep-26-2012

Chandrasekaran, Parrilo and Willsky (2010) proposed a convex optimization problem to characterize graphical model selection in the presence of unobserved variables. This convex optimization problem aims to estimate an inverse covariance matrix that can be decomposed into a sparse matrix minus a low-rank matrix from sample data. Solving this convex optimization problem is very challenging, especially for large problems. In this paper, we propose two alternating direction methods for solving this problem. The first method is to apply the classical alternating direction method of multipliers to solve the problem as a consensus problem. The second method is a proximal gradient based alternating direction method of multipliers. Our methods exploit and take advantage of the special structure of the problem and thus can solve large problems very efficiently. Global convergence result is established for the proposed methods. Numerical results on both synthetic data and gene expression data show that our methods usually solve problems with one million variables in one to two minutes, and are usually five to thirty five times faster than a state-of-the-art Newton-CG proximal point algorithm.

artificial intelligence, direction method, machine learning, (17 more...)

arXiv.org Machine Learning

1206.1275

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

The Issue-Adjusted Ideal Point Model

Gerrish, Sean M., Blei, David M.

arXiv.org Machine LearningSep-26-2012

Legislative behavior centers around the votes made by lawmakers. These votes are captured in roll call data, a matrix with lawmakers in the rows and proposed legislation in the columns. We illustrate a sample of roll call votes for the United States Senate in Figure 1. The seminal work of Poole and Rosenthal (1985) introduced the ideal point model, using roll call data to infer the latent political positions of the lawmakers. The ideal point model is a latent factor model of binary data and an application of item-response theory (Lord 1980) to roll call data. It gives each lawmaker a latent political position along a single dimension and then uses these points (called the ideal points) in a model of the votes.

data mining, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

1209.6004

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry:

Law > Statutes (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
(3 more...)

Add feedback

Towards a learning-theoretic analysis of spike-timing dependent plasticity

Balduzzi, David, Besserve, Michel

arXiv.org Machine LearningSep-25-2012

This paper suggests a learning-theoretic perspective on how synaptic plasticity benefits global brain functioning. We introduce a model, the selectron, that (i) arises as the fast time constant limit of leaky integrate-and-fire neurons equipped with spiking timing dependent plasticity (STDP) and (ii) is amenable to theoretical analysis. We show that the selectron encodes reward estimates into spikes and that an error bound on spikes is controlled by a spiking margin and the sum of synaptic weights. Moreover, the efficacy of spikes (their usefulness to other reward maximizing selectrons) also depends on total synaptic strength. Finally, based on our analysis, we propose a regularized version of STDP, and show the regularization improves the robustness of neuronal learning when faced with multiple stimuli.

artificial intelligence, machine learning, spike, (18 more...)

arXiv.org Machine Learning

1209.5549

Country: Europe > Germany (0.28)

Genre: Research Report (0.84)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.31)

Add feedback

On Constrained Spectral Clustering and Its Applications

Wang, Xiang, Qian, Buyue, Davidson, Ian

arXiv.org Machine LearningSep-21-2012

Constrained clustering has been well-studied for algorithms such as $K$-means and hierarchical clustering. However, how to satisfy many constraints in these algorithmic settings has been shown to be intractable. One alternative to encode many constraints is to use spectral clustering, which remains a developing area. In this paper, we propose a flexible framework for constrained spectral clustering. In contrast to some previous efforts that implicitly encode Must-Link and Cannot-Link constraints by modifying the graph Laplacian or constraining the underlying eigenspace, we present a more natural and principled formulation, which explicitly encodes the constraints as part of a constrained optimization problem. Our method offers several practical advantages: it can encode the degree of belief in Must-Link and Cannot-Link constraints; it guarantees to lower-bound how well the given constraints are satisfied using a user-specified threshold; it can be solved deterministically in polynomial time through generalized eigendecomposition. Furthermore, by inheriting the objective function from spectral clustering and encoding the constraints explicitly, much of the existing analysis of unconstrained spectral clustering techniques remains valid for our formulation. We validate the effectiveness of our approach by empirical results on both artificial and real datasets. We also demonstrate an innovative use of encoding large number of constraints: transfer learning via constraints.

artificial intelligence, constraint, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1007/s10618-012-0291-9

1201.5338

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

Add feedback

Link Prediction in Graphs with Autoregressive Features

Richard, Emile, Gaiffas, Stephane, Vayatis, Nicolas

arXiv.org Machine LearningSep-14-2012

In the paper, we consider the problem of link prediction in time-evolving graphs. We assume that certain graph features, such as the node degree, follow a vector autoregressive (VAR) model and we propose to use this information to improve the accuracy of prediction. Our strategy involves a joint optimization procedure over the space of adjacency matrices and VAR matrices which takes into account both sparsity and low rank properties of the matrices. Oracle inequalities are derived and illustrate the trade-offs in the choice of smoothing parameters when modeling the joint effect of sparsity and low rank property. The estimate is computed efficiently using proximal methods through a generalized forward-backward agorithm.

graph, inequality, matrix, (15 more...)

arXiv.org Machine Learning

1209.323

Country:

Europe > France > Île-de-France > Val-de-Marne > Cachan (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Add feedback

Query Complexity of Derivative-Free Optimization

Jamieson, Kevin G., Nowak, Robert D., Recht, Benjamin

arXiv.org Machine LearningSep-11-2012

This paper provides lower bounds on the convergence rate of Derivative Free Optimization (DFO) with noisy function evaluations, exposing a fundamental and unavoidable gap between the performance of algorithms with access to gradients and those with access to only function evaluations. However, there are situations in which DFO is unavoidable, and for such situations we propose a new DFO algorithm that is proved to be near optimal for the class of strongly convex objective functions. A distinctive feature of the algorithm is that it uses only Boolean-valued function comparisons, rather than function evaluations. This makes the algorithm useful in an even wider range of applications, such as optimization based on paired comparisons from human subjects, for example. We also show that regardless of whether DFO is based on noisy function evaluations or Boolean-valued function comparisons, the convergence rate is the same.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

1209.2434

Country: North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.40)

Add feedback

On the Equivalence between Herding and Conditional Gradient Algorithms

Bach, Francis, Lacoste-Julien, Simon, Obozinski, Guillaume

arXiv.org Machine LearningSep-11-2012

We show that the herding procedure of Welling (2009) takes exactly the form of a standard convex optimization algorithm--namely a conditional gradient algorithm minimizing a quadratic moment discrepancy. This link enables us to invoke convergence results from convex optimization and to consider faster alternatives for the task of approximating integrals in a reproducing kernel Hilbert space. We study the behavior of the different variants through numerical simulations. The experiments indicate that while we can improve over herding on the task of approximating integrals, the original herding algorithm tends to approach more often the maximum entropy distribution, shedding more light on the learning bias behind herding.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1203.4523

Country: Europe (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)

Add feedback

Optimal measures and Markov transition kernels

Belavkin, Roman V.

arXiv.org Machine LearningSep-5-2012

We study optimal solutions to an abstract optimization problem for measures, which is a generalization of classical variational problems in information theory and statistical physics. In the classical problems, information and relative entropy are defined using the Kullback-Leibler divergence, and for this reason optimal measures belong to a one-parameter exponential family. Measures within such a family have the property of mutual absolute continuity. Here we show that this property characterizes other families of optimal positive measures if a functional representing information has a strictly convex dual. Mutual absolute continuity of optimal probability measures allows us to strictly separate deterministic and non-deterministic Markov transition kernels, which play an important role in theories of decisions, estimation, control, communication and computation. We show that deterministic transitions are strictly sub-optimal, unless information resource with a strictly convex dual is unconstrained. For illustration, we construct an example where, unlike non-deterministic, any deterministic kernel either has negatively infinite expected utility (unbounded expected error) or communicates infinite information.

information, kernel, transition kernel, (16 more...)

arXiv.org Machine Learning

doi: 10.1007/s10898-012-9851-1

1012.0366

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)
(10 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)

Add feedback

Proximal methods for the latent group lasso penalty

Villa, Silvia, Rosasco, Lorenzo, Mosci, Sofia, Verri, Alessandro

arXiv.org Machine LearningSep-3-2012

We consider a regularized least squares problem, with regularization by structured sparsity-inducing norms, which extend the usual $\ell_1$ and the group lasso penalty, by allowing the subsets to overlap. Such regularizations lead to nonsmooth problems that are difficult to optimize, and we propose in this paper a suitable version of an accelerated proximal method to solve them. We prove convergence of a nested procedure, obtained composing an accelerated proximal method with an inner algorithm for computing the proximity operator. By exploiting the geometrical properties of the penalty, we devise a new active set strategy, thanks to which the inner iteration is relatively fast, thus guaranteeing good computational performances of the overall algorithm. Our approach allows to deal with high dimensional problems without pre-processing for dimensionality reduction, leading to better computational and prediction performances with respect to the state-of-the art methods, as shown empirically both on toy and real data.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

1209.0368

Country: North America > United States (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Practical Bayesian Optimization of Machine Learning Algorithms

Snoek, Jasper, Larochelle, Hugo, Adams, Ryan P.

arXiv.org Machine LearningAug-29-2012

Machine learning algorithms are rarely parameter-free; whether via the properties of a regularizer, the hyperprior of a generative model, or the step size of a gradient-based optimization, learning procedures almost always require a set of high-level choices that significantly impact generalization performance. As a practitioner, one is usually able to specify the general framework of an inductive bias much more easily than the particular weighting that it should have relative to training data. As a result, these high-level parameters are often considered a nuisance, making it desirable to develop algorithms with as few of these "knobs" as possible. Another, more flexible take on this issue is to view the optimization of high-level parameters as a procedure to be automated. Specifically, we could view such tuning as the optimization of an unknown black-box function that reflects generalization performance and invoke algorithms developed for such problems. These optimization problems have a somewhat different flavor than the low-level objectives one often encounters as part of a training procedure: here function evaluations are very expensive, as they involve running the primary machine learning algorithm to completion. In this setting where function evaluations are expensive, it is desirable to spend computational time making better choices about where to seek the best parameters. Bayesian optimization (Mockus et al., 1978) provides an elegant approach and has been shown to outperform other state of the art global optimization algorithms on a number of challenging optimization benchmark functions (Jones, 2001).

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

1206.2944

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback