AITopics

Recently the Fisher score (or the Fisher kernel) is increasingly used as a feature extractor for classification problems. The Fisher score is a vector of parameter derivatives of loglikelihood of a probabilistic model. This paper gives a theoretical analysis about how class information is preserved in the space of the Fisher score, which turns out that the Fisher score consists of a few important dimensions with class information and many nuisance dimensions. When we perform clustering with the Fisher score, K-Means type methods are obviously inappropriate because they make use of all dimensions. So we will develop a novel but simple clustering algorithm specialized for the Fisher score, which can exploit important dimensions. This algorithm is successfully tested in experiments with artificial data and real data (amino acid sequences).

dimension, fisher score, nuisance dimension, (15 more...)

Country:

Europe > Germany > Brandenburg > Potsdam (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Berlin (0.04)
(2 more...)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Dynamic Bayesian Networks with Deterministic Latent Tables

Barber, David

The application of latent/hidden variable Dynamic Bayesian Networks is constrained by the complexity of marginalising over latent variables. For this reason either small latent dimensions or Gaussian latent conditional tables linearly dependent on past states are typically considered in order that inference is tractable. We suggest an alternative approach in which the latent variables are modelled using deterministic conditional probability tables. This specialisation has the advantage of tractable inference even for highly complex nonlinear/non-Gaussian visible conditional probability tables. This approach enables the consideration of highly complex latent dynamics whilst retaining the benefits of a tractable probabilistic model.

bayesian network, dynamic bayesian network, sequence, (15 more...)

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Fischer, Bernd, Schumann, Johann, Buntine, Wray, Gray, Alexander G.

Automatic Derivation of Statistical Algorithms: The EM Family and Beyond

Machine learning has reached a point where many probabilistic methods can be understood as variations, extensions and combinations of a much smaller set of abstract themes, e.g., as different instances of the EM algorithm. This enables the systematic derivation of algorithms customized for different models.

algorithm, probability, schema, (13 more...)

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Kandola, Jaz, Cristianini, Nello, Shawe-taylor, John S.

Learning Semantic Similarity

The standard representation of text documents as bags of words suffers from well known limitations, mostly due to its inability to exploit semantic similarity between terms. Attempts to incorporate some notion of term similarity include latent semantic indexing [8], the use of semantic networks [9], and probabilistic methods [5]. In this paper we propose two methods for inferring such similarity from a corpus. The first one defines word-similarity based on document-similarity and viceversa, giving rise to a system of equations whose equilibrium point we use to obtain a semantic similarity measure. The second method models semantic relations by means of a diffusion process on a graph defined by lexicon and co-occurrence information.

kernel, matrix, similarity, (14 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Rosset, Saharon, Segal, Eran

Boosting Density Estimation

Several authors have suggested viewing boosting as a gradient descent search for a good fit in function space. We apply gradient-based boosting methodology to the unsupervised learning problem of density estimation. We show convergence properties of the algorithm and prove that a strength of weak learnability property applies to this problem as well. We illustrate the potential of this approach through experiments with boosting Bayesian networks to learn density models.

algorithm, bayesian network, weak learner, (14 more...)

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Education (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Law, Martin H., Jain, Anil K., Figueiredo, Mário

Feature Selection in Mixture-Based Clustering

There exist many approaches to clustering, but the important issue of feature selection, i.e., selecting the data attributes that are relevant for clustering, is rarely addressed. Feature selection for clustering is difficult due to the absence of class labels. We propose two approaches to feature selection in the context of Gaussian mixture-based clustering. In the first one, instead of making hard selections, we estimate feature saliencies. An expectation-maximization (EM) algorithm is derived for this task. The second approach extends Koller and Sahami's mutual-informationbased feature relevance criterion to the unsupervised case. Feature selection is then carried out by a backward search scheme. This scheme can be classified as a "wrapper", since it wraps mixture estimation in an outer layer that performs feature selection. Experimental results on synthetic and real data show that both methods have promising performance.

algorithm, feature selection, selection, (12 more...)

Country:

North America > United States > Wisconsin (0.04)
North America > United States > New York (0.04)
North America > United States > Michigan > Ingham County > Lansing (0.04)
(4 more...)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Lange, Tilman, Braun, Mikio L., Roth, Volker, Buhmann, Joachim M.

Stability-Based Model Selection

Model selection is linked to model assessment, which is the problem of comparing different models, or model parameters, for a specific learning task. For supervised learning, the standard practical technique is crossvalidation, which is not applicable for semi-supervised and unsupervised settings. In this paper, a new model assessment scheme is introduced which is based on a notion of stability. The stability measure yields an upper bound to cross-validation in the supervised case, but extends to semi-supervised and unsupervised problems. In the experimental part, the performance of the stability measure is studied for model order selection in comparison to standard techniques in this area.

learning, stability, stability measure, (12 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)

Industry: Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.72)

Herbrich, Ralf, Lawrence, Neil D., Seeger, Matthias

Fast Sparse Gaussian Process Methods: The Informative Vector Machine

We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on informationtheoretic principles, previously suggested for active learning. Our goal is not only to learn d-sparse predictors (which can be evaluated in O(d) rather than O(n), d n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements.

approximation, greedy selection, selection, (15 more...)

Country:

North America > United States > California (0.04)
Europe > United Kingdom > England > South Yorkshire > Sheffield (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Modeling & Simulation (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)

Cortes, Corinna, Haffner, Patrick, Mohri, Mehryar

Rational Kernels

We introduce a general family of kernels based on weighted transducers or rational relations, rational kernels, that can be used for analysis of variable-length sequences or more generally weighted automata, in applications such as computational biology or speech recognition. We show that rational kernels can be computed efficiently using a general algorithm of composition of weighted transducers and a general single-source shortest-distance algorithm. We also describe several general families of positive definite symmetric rational kernels. These general kernels can be combined with Support Vector Machines to form efficient and powerful techniques for spoken-dialog classification: highly complex kernels become easy to design and implement and lead to substantial improvements in the classification accuracy. We also show that the string kernels considered in applications to computational biology are all specific instances of rational kernels.

kernel, rational kernel, transducer, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York (0.05)
North America > United States > California (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Adaptive Nonlinear System Identification with Echo State Networks

Jaeger, Herbert

Echo state networks (ESN) are a novel approach to recurrent neural network training. An ESN consists of a large, fixed, recurrent "reservoir" network, from which the desired output is obtained by training suitable output connection weights. Determination of optimal output weights becomes a linear, uniquely solvable task of MSE minimization. This article reviews the basic ideas and describes an online adaptation scheme based on the RLS algorithm known from adaptive linear systems. As an example, a 10th order NARMA system is adaptively identified.

algorithm, esn, output weight, (14 more...)

Country:

Europe > Germany > Bremen > Bremen (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre:

Overview (0.94)
Research Report (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)