AITopics

We present an extension to the Mixture of Experts (ME) model, where the individual experts are Gaussian Process (GP) regression models. Using aninput-dependent adaptation of the Dirichlet Process, we implement agating network for an infinite number of Experts. Inference in this model may be done efficiently using a Markov Chain relying on Gibbs sampling. The model allows the effective covariance function to vary with the inputs, and may handle large datasets - thus potentially overcoming twoof the biggest hurdles with GP models.

artificial intelligence, covariance function, machine learning, (17 more...)

Country:

Europe > United Kingdom (0.28)
North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Rangarajan, Anand, Yuille, Alan L.

MIME: Mutual Information Minimization and Entropy Maximization for Bayesian Belief Propagation

Yuille's algorithm is based on a certain decomposition of the Bethe

artificial intelligence, free energy, machine learning, (15 more...)

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.42)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.42)

Matching Free Trees with Replicator Equations

Pelillo, Marcello

Motivated by our recent work on rooted tree matching, in this paper we provide a solution to the problem of matching two free (i.e., unrooted) trees by constructing an association graph whose maximal cliques are in one-to-one correspondence with maximal common subtrees. We then solve the problem using simple replicator dynamics from evolutionary game theory. Experiments on hundreds of uniformly random trees are presented. The results are impressive: despite the inherent inability of these simple dynamics to escape from local optima, they always returned a globally optimal solution.

artificial intelligence, clique, optimization problem, (15 more...)

Country:

Europe (0.47)
North America > United States > Massachusetts (0.14)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

Ng, Andrew Y., Jordan, Michael I.

On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes

Discriminative classifiers model the posterior p(ylx)directly, or learn a direct map from inputs x to the class labels. There are several compelling reasons for using discriminative rather than generative classifiers, oneof which, succinctly articulated by Vapnik [6], is that "one should solve the [classification] problem directly and never solve a more general problem as an intermediate step [such as modeling p(xly)]." Indeed, leaving aside computational issues and matters such as handling missing data, the prevailing consensus seems to be that discriminative classifiers are almost always to be preferred to generative ones. Anotherpiece of prevailing folk wisdom is that the number of examples needed to fit a model is often roughly linear in the number of free parameters of a model. This has its theoretical basis in the observation that for "many" models, the VC dimension is roughly linear or at most some low-order polynomial in the number of parameters (see, e.g., [1, 3]), and it is known that sample complexity in the discriminative setting is linear in the VC dimension [6]. In this paper, we study empirically and theoretically the extent to which these beliefs are true. A parametric family of probabilistic models p(x, y) can be fit either to optimize the joint likelihood of the inputs and the labels, or fit to optimize the conditional likelihood p(ylx), or even fit to minimize the 0-1 training error obtained by thresholding p(ylx) to make predictions.

artificial intelligence, logistic regression, machine learning, (14 more...)

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre:

Research Report > New Finding (0.56)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Murphy, Kevin P., Paskin, Mark A.

Linear-time inference in Hierarchical HMMs

The hierarchical hidden Markov model (HHMM) is a generalization of the hidden Markov model (HMM) that models sequences with structure at many length/time scales [FST98].

algorithm, artificial intelligence, machine learning, (17 more...)

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Littman, Michael L., Kearns, Michael J., Singh, Satinder P.

An Efficient, Exact Algorithm for Solving Tree-Structured Graphical Games

The algorithm is the first to compute equilibria both efficiently and exactly for a nontrivial class of graphical games. 1 Introduction Seeking to replicate the representational and computational benefits that graphical modelshave provided to probabilistic inference, several recent works have introduced graph-theoretic frameworks for the study of multi-agent systems (LaMura 2000; Koller and Milch 2001; Kearns et al. 2001). In the simplest of these formalisms, each vertex represents a single agent, and the edges represent pairwise interaction between agents. As with many familiar network models, the macroscopic behavior of a large system is thus implicitly described by its local interactions, andthe computational challenge is to extract the global states of interest. Classical game theory is typically used to model multi-agent interactions, and the global states of interest are thus the so-called Nash equilibria, in which no agent has a unilateral incentive to deviate. In a recent paper (Kearns et al. 2001), we introduced such a graphical formalism for multi-agent game theory, and provided two algorithms for computing Nash equilibria whenthe underlying graph is a tree (or is sufficiently sparse).

artificial intelligence, breakpoint, breakpoint policy, (15 more...)

Country: North America > United States > Pennsylvania (0.28)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Langford, John, Caruana, Rich

(Not) Bounding the True Error

We present a new approach to bounding the true error rate of a continuous valued classifier based upon PAC-Bayes bounds. The method first constructs adistribution over classifiers by determining how sensitive each parameter in the model is to noise. The true error rate of the stochastic classifier found with the sensitivity analysis can then be tightly bounded using a PAC-Bayes bound.

artificial intelligence, machine learning, neural network, (13 more...)

Country: North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Lanckriet, Gert, Ghaoui, Laurent E., Bhattacharyya, Chiranjib, Jordan, Michael I.

Minimax Probability Machine

When constructing a classifier, the probability of correct classification offuture data points should be maximized. In the current paper this desideratum is translated in a very direct way into an optimization problem, which is solved using methods from convex optimization.We also show how to exploit Mercer kernels in this setting to obtain nonlinear decision boundaries. A worst-case bound on the probability of misclassification of future data is obtained explicitly. 1 Introduction Consider the problem of choosing a linear discriminant by minimizing the probabilities thatdata vectors fall on the wrong side of the boundary. One way to attempt to achieve this is via a generative approach in which one makes distributional assumptions aboutthe class-conditional densities and thereby estimates and controls the relevant probabilities. The need to make distributional assumptions, however, casts doubt on the generality and validity of such an approach, and in discriminative solutionsto classification problems it is common to attempt to dispense with class-conditional densities entirely.

artificial intelligence, machine learning, optimization problem, (15 more...)

Country: North America > United States > California > Alameda County > Berkeley (0.16)

Industry: Health & Medicine > Therapeutic Area (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Kohlmorgen, Jens, Lemm, Steven

A Dynamic HMM for On-line Segmentation of Sequential Data

We propose a novel method for the analysis of sequential data that exhibits an inherent mode switching. In particular, the data might be a non-stationary time series from a dynamical system that switches between multiple operating modes. Unlike other approaches, ourmethod processes the data incrementally and without any training of internal parameters. We use an HMM with a dynamically changingnumber of states and an online variant of the Viterbi algorithm that performs an unsupervised segmentation and classification of the data on-the-fly, i.e. the method is able to process incomingdata in real-time. The main idea of the approach is to track and segment changes of the probability density of the data in a sliding window on the incoming data stream.

algorithm, artificial intelligence, machine learning, (17 more...)

Country: Europe > Germany (0.14)

Genre: Research Report (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.47)

Kivinen, Jyrki, Smola, Alex J., Williamson, Robert C.

Online Learning with Kernels

We consider online learning in a Reproducing Kernel Hilbert Space. Our method is computationally efficient and leads to simple algorithms. In particular we derive update equations for classification, regression, and novelty detection. The inclusion of the -trick allows us to give a robust parameterization.

artificial intelligence, data mining, machine learning, (17 more...)

Country: North America > United States (0.15)

Industry: Education > Educational Setting > Online (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.61)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.58)