AITopics

In a recent paper Joachims (2006) presented SVM-Perf, a cutting plane method (CPM) for training linear Support Vector Machines (SVMs) which converges to an $\epsilon$ accurate solution in $O(1/\epsilon^{2})$ iterations. By tightening the analysis, Teo et al. (2010) showed that $O(1/\epsilon)$ iterations suffice. Given the impressive convergence speed of CPM on a number of practical problems, it was conjectured that these rates could be further improved. In this paper we disprove this conjecture. We present counter examples which are not only applicable for training linear SVMs with hinge loss, but also hold for support vector methods which optimize a \emph{multivariate} performance score. However, surprisingly, these problems are not inherently hard. By exploiting the structure of the objective function we can devise an algorithm that converges in $O(1/\sqrt{\epsilon})$ iterations.

algorithm, artificial intelligence, optimization problem, (17 more...)

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Morales, Jean, Micchelli, Charles A., Pontil, Massimiliano

A Family of Penalty Functions for Structured Sparsity

We study the problem of learning a sparse linear regression vector under additional conditions on the structure of its sparsity pattern. We present a family of convex penalty functions, which encode this prior knowledge by means of a set of constraints on the absolute values of the regression coefficients. This family subsumes the $\ell_1$ norm and is flexible enough to include different models of sparsity patterns, which are of practical and theoretical importance. We establish some important properties of these functions and discuss some examples where they can be computed explicitly. Moreover, we present a convergent optimization algorithm for solving regularized least squares with these penalty functions. Numerical simulations highlight the benefit of structured sparsity and the advantage offered by our approach over the Lasso and other related methods.

artificial intelligence, optimization problem, penalty function, (17 more...)

Country:

Europe > United Kingdom > England (0.14)
Asia > China > Hong Kong (0.14)
North America > United States > New York (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Sayedi, Amin, Zadimoghaddam, Morteza, Blum, Avrim

Trading off Mistakes and Don't-Know Predictions

We discuss an online learning framework in which the agent is allowed to say ``I don't know'' as well as making incorrect predictions on given examples. We analyze the trade off between saying ``I don't know'' and making mistakes. If the number of don't know predictions is forced to be zero, the model reduces to the well-known mistake-bound model introduced by Littlestone [Lit88]. On the other hand, if no mistakes are allowed, the model reduces to KWIK framework introduced by Li et. al. [LLW08]. We propose a general, though inefficient, algorithm for general finite concept classes that minimizes the number of don't-know predictions if a certain number of mistakes are allowed. We then present specific polynomial-time algorithms for the concept classes of monotone disjunctions and linear separators.

algorithm, artificial intelligence, machine learning, (15 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Sontag, David, Meshi, Ofer, Globerson, Amir, Jaakkola, Tommi S.

More data means less inference: A pseudo-max approach to structured learning

The problem of learning to predict structured labels is of key importance in many applications. However, for general graph structure both learning and inference in this setting are intractable. Here we show that it is possible to circumvent this difficulty when the input distribution is rich enough via a method similar in spirit to pseudo-likelihood. We show how our new method achieves consistency, and illustrate empirically that it indeed performs as well as exact methods when sufficiently large training sets are used.

artificial intelligence, constraint, inductive learning, (18 more...)

Country: North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.85)

Jones, Peter, Saligrama, Venkatesh, Mitter, Sanjoy

Probabilistic Belief Revision with Structural Constraints

Experts (human or computer) are often required to assess the probability of uncertain events. When a collection of experts independently assess events that are structurally interrelated, the resulting assessment may violate fundamental laws of probability. Such an assessment is termed incoherent. In this work we investigate how the problem of incoherence may be affected by allowing experts to specify likelihood models and then update their assessments based on the realization of a globally-observable random sequence.

assessment, bayesian inference, us government, (17 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Industry:

Information Technology > Security & Privacy (0.47)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.95)

Mahadevan, Sridhar, Liu, Bo

Basis Construction from Power Series Expansions of Value Functions

This paper explores links between basis construction methods in Markov decision processes and power series expansions of value functions. This perspective provides a useful framework to analyze properties of existing bases, as well as provides insight into constructing more effective bases. Krylov and Bellman error bases are based on the Neumann series expansion. These bases incur very large initial Bellman errors, and can converge rather slowly as the discount factor approaches unity. The Laurent series expansion, which relates discounted and average-reward formulations, provides both an explanation for this slow convergence as well as suggests a way to construct more efficient basis representations. The first two terms in the Laurent series represent the scaled average-reward and the average-adjusted sum of rewards, and subsequent terms expand the discounted value function using powers of a generalized inverse called the Drazin (or group inverse) of a singular matrix derived from the transition matrix. Experiments show that Drazin bases converge considerably more quickly than several other bases, particularly for large values of the discount factor. An incremental variant of Drazin bases called Bellman average-reward bases (BARBs) is described, which provides some of the same benefits at lower computational cost.

artificial intelligence, matrix, reinforcement learning, (18 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Druckmann, Shaul, Chklovskii, Dmitri B.

Over-complete representations on recurrent neural networks can support persistent percepts

A striking aspect of cortical neural networks is the divergence of a relatively small number of input channels from the peripheral sensory apparatus into a large number of cortical neurons, an over-complete representation strategy. Cortical neurons are then connected by a sparse network of lateral synapses. Here we propose that such architecture may increase the persistence of the representation of an incoming stimulus, or a percept. We demonstrate that for a family of networks in which the receptive field of each neuron is re-expressed by its outgoing connections, a represented percept can remain constant despite changing activity. We term this choice of connectivity REceptive FIeld REcombination (REFIRE) networks. The sparse REFIRE network may serve as a high-dimensional integrator and a biologically plausible model of the local cortical circuit.

deep learning, neural network, representation, (22 more...)

Country:

North America > United States (0.14)
Europe > Germany (0.14)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Transduction with Matrix Completion: Three Birds with One Stone

Goldberg, Andrew, Recht, Ben, Xu, Junming, Nowak, Robert, Zhu, Jerry

We pose transductive classification as a matrix completion problem. By assuming the underlying matrix has a low rank, our formulation is able to handle three problems simultaneously: i) multi-label learning, where each item has more than one label, ii) transduction, where most of these labels are unspecified, and iii) missing data, where a large number of features are missing. We obtained satisfactory results on several real-world tasks, suggesting that the low rank assumption may not be as restrictive as it seems. Our method allows for different loss functions to apply on the feature and label entries of the matrix. The resulting nuclear norm minimization problem is solved with a modified fixed-point continuation method that is guaranteed to find the global optimum.

algorithm, artificial intelligence, optimization problem, (18 more...)

Country: North America > United States > Wisconsin > Dane County > Madison (0.14)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Chambers, America, Smyth, Padhraic, Steyvers, Mark

Learning concept graphs from text with stick-breaking priors

We present a generative probabilistic model for learning general graph structures, which we term concept graphs, from text. Concept graphs provide a visual summary of the thematic content of a collection of documents-a task that is difficult to accomplish using only keyword search. The proposed model can learn different types of concept graph structures and is capable of utilizing partial prior knowledge about graph structure as well as labeled documents. We describe a generative model that is based on a stick-breaking process for graphs, and a Markov Chain Monte Carlo inference procedure. Experiments on simulated data show that the model can recover known graph structure when learning in both unsupervised and semi-supervised modes. We also show that the proposed model is competitive in terms of empirical log likelihood with existing structure-based topic models (such as hPAM and hLDA) on real-world text data sets. Finally, we illustrate the application of the model to the problem of updating Wikipedia category graphs.

artificial intelligence, bayesian inference, node, (18 more...)

Country: North America > United States > California > Orange County > Irvine (0.15)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)

Two-Layer Generalization Analysis for Ranking Using Rademacher Average

Chen, Wei, Liu, Tie-yan, Ma, Zhi-ming

This paper is concerned with the generalization analysis on learning to rank for information retrieval (IR). In IR, data are hierarchically organized, i.e., consisting of queries and documents per query. Previous generalization analysis for ranking, however, has not fully considered this structure, and cannot explain how the simultaneous change of query number and document number in the training data will affect the performance of algorithms. In this paper, we propose performing generalization analysis under the assumption of two-layer sampling, i.e., the i.i.d. sampling of queries and the conditional i.i.d sampling of documents per query. Such a sampling can better describe the generation mechanism of real data, and the corresponding generalization analysis can better explain the real behaviors of learning to rank algorithms. However, it is challenging to perform such analysis, because the documents associated with different queries are not identically distributed, and the documents associated with the same query become no longer independent if represented by features extracted from the matching between document and query. To tackle the challenge, we decompose the generalization error according to the two layers, and make use of the new concept of two-layer Rademacher average. The generalization bounds we obtained are quite intuitive and are in accordance with previous empirical studies on the performance of ranking algorithms.

artificial intelligence, evolutionary algorithm, generalization, (17 more...)

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.35)