AITopics

In transfer learning we aim to solve new problems using fewer examples using information gained from solving related problems. Transfer learning has been successful in practice, and extensive PAC analysis of these methods has been developed. Howeverit is not yet clear how to define relatedness between tasks. This is considered as a major problem as it is conceptually troubling and it makes it unclear how much information to transfer and when and how to transfer it. In this paper we propose to measure the amount of information one task contains about another using conditional Kolmogorov complexity between the tasks. We show how existing theory neatly solves the problem of measuring relatedness and transferring the'right' amount of information in sequential transfer learning in a Bayesian setting. The theory also suggests that, in a very formal and precise sense, no other reasonable transfer method can do much better than our Kolmogorov Complexity theoretic transfer method, and that sequential transfer is always justified. Wealso develop a practical approximation to the method and use it to transfer information between 8 arbitrarily chosen databases from the UCI ML repository.

experiment, information, kolmogorov complexity, (15 more...)

Country:

North America > United States > Illinois (0.04)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Overview (0.34)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Mahdaviani, Maryam, Choudhury, Tanzeem

Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition

We present a new and efficient semi-supervised training method for parameter estimation and feature selection in conditional random fields (CRFs). In real-world applications such as activity recognition, unlabeled sensor traces are relatively easy to obtain whereas labeled examples are expensive and tedious to collect. Furthermore, the ability to automatically select a small subset of discriminatory features from a large pool can be advantageous in terms of computational speed as well as accuracy. In this paper, we introduce the semi-supervised virtual evidence boosting (sVEB) algorithm for training CRFs -- a semi-supervised extension to the recently developed virtual evidence boosting (VEB) method for feature selection and parameter learning. Semi-supervised VEB takes advantage of the unlabeled data via minimum entropy regularization -- the objective function combines the unlabeled conditional entropy with labeled conditional pseudo-likelihood. The sVEB algorithm reduces the overall system cost as well as the human labeling cost required during training, which are both important considerations in building real world inference systems. In a set of experiments on synthetic data and real activity traces collected from wearable sensors, we illustrate that our algorithm benefits from both the use of unlabeled data and automatic feature selection, and outperforms other semi-supervised training approaches.

algorithm, feature selection, unlabeled data, (15 more...)

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Li, Ping, Hastie, Trevor J.

A Unified Near-Optimal Estimator For Dimension Reduction in $l_\alpha$ ($0<\alpha\leq 2$) Using Stable Random Projections

Many tasks (e.g., clustering) in machine learning only require the l

estimator, random projection, stable random projection, (12 more...)

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.42)

Kolter, J. Z., Abbeel, Pieter, Ng, Andrew Y.

Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion

We consider apprenticeship learning--learning from expert demonstrations--in the setting of large, complex domains. Past work in apprenticeship learning requires that the expert demonstrate complete trajectories through the domain. However, in many problems even an expert has difficulty controlling the system, which makes this approach infeasible. For example, consider the task of teaching aquadruped robot to navigate over extreme terrain; demonstrating an optimal policy (i.e., an optimal set of foot locations over the entire terrain) is a highly nontrivial task, even for an expert. In this paper we propose a method for hierarchical apprenticeshiplearning, which allows the algorithm to accept isolated advice at different hierarchical levels of the control task. This type of advice is often feasible for experts to give, even if the expert is unable to demonstrate complete trajectories.This allows us to extend the apprenticeship learning paradigm to much larger, more challenging domains. In particular, in this paper we apply the hierarchical apprenticeship learning algorithm to the task of quadruped locomotion overextreme terrain, and achieve, to the best of our knowledge, results superior to any previously published work.

apprenticeship, constraint, terrain, (14 more...)

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Kato, Tsuyoshi, Kashima, Hisashi, Sugiyama, Masashi, Asai, Kiyoshi

Multi-Task Learning via Conic Programming

When we have several related tasks, solving them simultaneously is shown to be more effective than solving them individually. This approach is called multi-task learning (MTL) and has been studied extensively. Existing approaches to MTL often treat all the tasks as \emph{uniformly related to each other and the relatedness of the tasks is controlled globally. For this reason, the existing methods can lead to undesired solutions when some tasks are not highly related to each other, and some pairs of related tasks can have significantly different solutions. In this paper, we propose a novel MTL algorithm that can overcome these problems. Our method makes use of a task network, which describes the relation structure among tasks. This allows us to deal with intricate relation structures in a systematic way. Furthermore, we control the relatedness of the tasks locally, so all pairs of related tasks are guaranteed to have similar solutions. We apply the above idea to support vector machines (SVMs) and show that the optimization problem can be cast as a second order cone program, which is convex and can be solved efficiently. The usefulness of our approach is demonstrated through simulations with protein super-family classification and ordinal regression problems.

constraint, mtl-svm, task network, (14 more...)

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry:

Health & Medicine (0.46)
Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)

Huang, Jonathan, Guestrin, Carlos, Guibas, Leonidas J.

Efficient Inference for Distributions on Permutations

In this paper, we use the "low-frequency"

coefficient, fourier transform, representation, (15 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Hoffman, Matthew, Doucet, Arnaud, Freitas, Nando D., Jasra, Ajay

Bayesian Policy Learning with Trans-Dimensional MCMC

Continuous state-space Markov Decision Processes (MDPs) are notoriously difficult to solve.

algorithm, mcmc, probability distribution, (16 more...)

Country:

North America > Canada > British Columbia (0.05)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.92)

Globerson, Amir, Jaakkola, Tommi S.

Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations

We present a novel message passing algorithm for approximating the MAP problem in graphical models. The algorithm is similar in structure to max-product but unlike max-product it always converges, and can be proven to find the exact MAP solution in various settings. The algorithm is derived via block coordinate descent in a dual of the LP relaxation of MAP, but does not require any tunable parameters such as step size or tree weights. We also describe a generalization of the method to cluster based potentials. The new method is tested on synthetic and real-world problems, and compares favorably with previous approaches.

algorithm, constraint, lp relaxation, (13 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Oceania > Fiji (0.05)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning (0.93)

Gao, Dashan, Mahadevan, Vijay, Vasconcelos, Nuno

The discriminant center-surround hypothesis for bottom-up saliency

The classical hypothesis, that bottom-up saliency is a center-surround process, is combined with a more recent hypothesis that all saliency decisions are optimal in a decision-theoretic sense. The combined hypothesis is denoted as discriminant center-surround saliency, and the corresponding optimal saliency architecture is derived. This architecture equates the saliency of each image location to the discriminant power of a set of features with respect to the classification problem that opposes stimuli at center and surround, at that location. It is shown that the resulting saliency detector makes accurate quantitative predictions for various aspects of the psychophysics of human saliency, including non-linear properties beyond the reach of previous saliency models. Furthermore, it is shown that discriminant center-surround saliency can be easily generalized to various stimulus modalities (such as color, orientation and motion), and provides optimal solutions for many other saliency problems of interest for computer vision. Optimal solutions, under this hypothesis, are derived for a number of the former (including static natural images, dense motion fields, and even dynamic textures), and applied to a number of the latter (the prediction of human eye fixations, motion-based saliency in the presence of ego-motion, and motion-based saliency in the presence of highly dynamic backgrounds). In result, discriminant saliency is shown to predict eye fixations better than previous models, and produce background subtraction algorithms that outperform the state-of-the-art in computer vision.

discriminant saliency, hypothesis, saliency, (15 more...)

Country: North America > United States > California > San Diego County > San Diego (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Foo, Chuan-sheng, Do, Chuong B., Ng, Andrew Y.

Efficient multiple hyperparameter learning for log-linear models

Using multiple regularization hyperparameters is an effective method for managing model complexity in problems where input features have varying amounts of noise. While algorithms for choosing multiple hyperparameters are often used in neural networks and support vector machines, they are not common in structured prediction tasks, such as sequence labeling or parsing. In this paper, we consider the problem of learning regularization hyperparameters for log-linear models, a class of probabilistic models for structured prediction tasks which includes conditional random fields (CRFs). Using an implicit differentiation trick, we derive an efficient gradient-based method for learning Gaussian regularization priors with multiple hyperparameters. In both simulations and the real-world task of computational RNA secondary structure prediction, we find that multiple hyperparameter learning provides a significant boost in accuracy compared to models learned using only a single regularization hyperparameter.

algorithm, hyperparameter, log-linear model, (14 more...)