AITopics

Country: North America > United States (0.14)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Pasula, Hanna, Marthi, Bhaskara, Milch, Brian, Russell, Stuart J., Shpitser, Ilya

Identity Uncertainty and Citation Matching

Identity uncertainty is a pervasive problem in real-world data analysis. It arises whenever objects are not labeled with unique identifiers or when those identifiers may not be perceived perfectly. In such cases, two observations may or may not correspond to the same object. In this paper, we consider the problem in the context of citation matching--the problem of deciding which citations correspond to the same publication. Our approach is based on the use of a relational probability model to define a generative model for the domain, including models of author and title corruption and a probabilistic citation grammar. Identity uncertainty is handled by extending standard models to incorporate probabilities over the possible mappings between terms in the language and objects in the domain. Inference is based on Markov chain Monte Carlo, augmented with specific methods for generating efficient proposals when the domain contains many objects. Results on several citation data sets show that the method outperforms current algorithms for citation matching. The declarative, relational nature of the model also means that our algorithm can determine object characteristics such as author names by combining multiple citations of multiple papers.

Chakrabartty, Shantanu, Cauwenberghs, Gert

Forward-Decoding Kernel-Based Phone Recognition

Forward decoding kernel machines (FDKM) combine large-margin classifiers with hidden Markov models (HMM) for maximum a posteriori (MAP) adaptive sequence estimation. State transitions in the sequence are conditioned on observed data using a kernel-based probability model trained with a recursive scheme that deals effectively with noisy and partially labeled data. Training over very large data sets is accomplished using a sparse probabilistic support vector machine (SVM) model based on quadratic entropy, and an online stochastic steepest descent algorithm. For speaker-independent continuous phone recognition, FDKM trained over 177,080 samples of the TlMIT database achieves 80.6% recognition accuracy over the full test set, without use of a prior phonetic language model. 1 Introduction Sequence estimation is at the core of many problems in pattern recognition, most notably speech and language processing. Recognizing dynamic patterns in sequential data requires a set of tools very different from classifiers trained to recognize static patterns in data assumed i.i.d.

artificial intelligence, machine learning, probability, (15 more...)

Country: North America > United States > Wisconsin > Dane County > Madison (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.91)

Derivative Observations in Gaussian Process Models of Dynamic Systems

Solak, E., Murray-smith, R., Leithead, W. E., Leith, D. J., Rasmussen, Carl E.

Gaussian processes provide an approach to nonparametric modelling which allows a straightforward combination of function and derivative observations in an empirical model. This is of particular importance in identification of nonlinear dynamic systems from experimental data.

artificial intelligence, function observation, machine learning, (13 more...)

Country: Europe > United Kingdom (0.29)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Modeling & Simulation (0.92)

Moghaddam, Baback, Shakhnarovich, Gregory

Boosted Dyadic Kernel Discriminants

We introduce a novel learning algorithm for binary classification with hyperplane discriminants based on pairs of training points from opposite classes (dyadic hypercuts). This algorithm is further extended to nonlinear discriminants using kernel functions satisfying Mercer's conditions. An ensemble of simple dyadic hypercuts is learned incrementally by means of a confidence-rated version of AdaBoost, which provides a sound strategy for searching through the finite set of hypercut hypotheses. In experiments with real-world datasets from the UCI repository, the generalization performance of the hypercut classifiers was found to be comparable to that of SVMs and k-NN classifiers. Furthermore, the computational cost of classification (at run time) was found to be similar to, or better than, that of SVM. Similarly to SVMs, boosted dyadic kernel discriminants tend to maximize the margin (via AdaBoost). In contrast to SVMs, however, we offer an online and incremental learning machine for building kernel discriminants whose complexity (number of kernel evaluations) can be directly controlled (traded off for accuracy).

artificial intelligence, classifier, machine learning, (17 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Ijspeert, Auke J., Nakanishi, Jun, Schaal, Stefan

Learning Attractor Landscapes for Learning Motor Primitives

Many control problems take place in continuous state-action spaces, e.g., as in manipulator robotics, where the control objective is often defined as finding a desired trajectory that reaches a particular goal state. While reinforcement learning offers a theoretical framework to learn such control policies from scratch, its applicability to higher dimensional continuous state-action spaces remains rather limited to date. Instead of learning from scratch, in this paper we suggest to learn a desired complex control policy by transforming an existing simple canonical control policy. For this purpose, we represent canonical policies in terms of differential equations with well-defined attractor properties. By nonlinearly transforming the canonical attractor dynamics using techniques from nonparametric regression, almost arbitrary new nonlinear policies can be generated without losing the stability properties of the canonical system. We demonstrate our techniques in the context of learning a set of movement skills for a humanoid robot from demonstrations of a human teacher. Policies are acquired rapidly, and, due to the properties of well formulated differential equations, can be reused and modified online under dynamic changes of the environment. The linear parameterization of nonparametric regression moreover lends itself to recognize and classify previously learned movement skills.

artificial intelligence, machine learning, trajectory, (16 more...)

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.36)

Ghaoui, Laurent E., Jordan, Michael I., Lanckriet, Gert R.

Robust Novelty Detection with Single-Class MPM

This algorithm-the "single-class minimax probability machine (MPM)"- is built on a distribution-free methodology that minimizes the worst-case probability of a data point falling outside of a convex set, given only the mean and covariance matrix of the distribution and making no further distributional assumptions. We present a robust approach to estimating the mean and covariance matrix within the general two-class MPM setting, and show how this approach specializes to the single-class problem. We provide empirical results comparing the single-class MPM to the single-class SVM and a two-class SVM method. 1 Introduction Novelty detection is an important unsupervised learning problem in which test data are to be judged as having been generated from the same or a different process as that which generated the training data.

artificial intelligence, covariance matrix, data mining, (15 more...)

Country: North America > United States (0.28)

Industry: Education (0.34)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Fischer, Bernd, Schumann, Johann, Buntine, Wray, Gray, Alexander G.

Automatic Derivation of Statistical Algorithms: The EM Family and Beyond

Machine learning has reached a point where many probabilistic methods can be understood as variations, extensions and combinations of a much smaller set of abstract themes, e.g., as different instances of the EM algorithm. This enables the systematic derivation of algorithms customized for different models.

Fukumizu, Kenji, Akaho, Shotaro, Amari, Shun-ichi

Critical Lines in Symmetry of Mixture Models and its Application to Component Splitting

We show the existence of critical points as lines for the likelihood function of mixture-type models. They are given by embedding of a critical point for models with less components. A sufficient condition that the critical line gives local maxima or saddle points is also derived. Based on this fact, a component-split method is proposed for a mixture of Gaussian components, and its effectiveness is verified through experiments.

artificial intelligence, machine learning, mixture model, (16 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Japan > Honshū > Kantō (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Cesa-bianchi, Nicolò, Conconi, Alex, Gentile, Claudio

Margin-Based Algorithms for Information Filtering

In this work, we study an information filtering model where the relevance labels associated to a sequence of feature vectors are realizations of an unknown probabilistic linear function. Building on the analysis of a restricted version of our model, we derive a general filtering rule based on the margin of a ridge regression estimator. While our rule may observe the label of a vector only by classfying the vector as relevant, experiments on a real-world document filtering problem show that the performance of our rule is close to that of the online classifier which is allowed to observe all labels. These empirical results are complemented by a theoretical analysis where we consider a randomized variant of our rule and prove that its expected number of mistakes is never much larger than that of the optimal filtering rule which knows the hidden linear model.

algorithm, artificial intelligence, machine learning, (18 more...)