AITopics

We derive a learning algorithm for inferring an overcomplete basis by viewing it as probabilistic model of the observed data. Overcomplete bases allow for better approximation of the underlying statistical density. Using a Laplacian prior on the basis coefficients removes redundancy and leads to representations that are sparse and are a nonlinear function of the data. This can be viewed as a generalization of the technique of independent component analysis and provides a method for blind source separation of fewer mixtures than sources. We demonstrate the utility of overcomplete representations on natural speech and show that compared to the traditional Fourier basis the inferred representations potentially have much greater coding efficiency.

algorithm, basis vector, representation, (10 more...)

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.49)

Kiviluoto, Kimmo, Oja, Erkki

S-Map: A Network with a Simple Self-Organization Algorithm for Generative Topographic Mappings

The S-Map is a network with a simple learning algorithm that combines the self-organization capability of the Self-Organizing Map (SOM) and the probabilistic interpretability of the Generative Topographic Mapping (GTM). The simulations suggest that the S Map algorithm has a stronger tendency to self-organize from random initial configuration than the GTM. The S-Map algorithm can be further simplified to employ pure Hebbian learning, without changing the qualitative behaviour of the network. 1 Introduction The self-organizing map (SOM; for a review, see [1]) forms a topographic mapping from the data space onto a (usually two-dimensional) output space. The SOM has been succesfully used in a large number of applications [2]; nevertheless, there are some open theoretical questions, as discussed in [1, 3]. Most of these questions arise because of the following two facts: the SOM is not a generative model, i.e. it does not generate a density in the data space, and it does not have a well-defined objective function that the training process would strictly minimize.

algorithm, gtm, s-map, (14 more...)

Country:

North America > United States > New Jersey > Middlesex County > Piscataway (0.05)
Europe > Finland > Uusimaa > Helsinki (0.05)
North America > United States > New York (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Hofmann, Thomas, Buhmann, Joachim M.

Active Data Clustering

Active data clustering is a novel technique for clustering of proximity data which utilizes principles from sequential experiment design in order to interleave data generation and data analysis. The proposed active data sampling strategy is based on the expected value of information, a concept rooting in statistical decision theory. This is considered to be an important step towards the analysis of largescale data sets, because it offers a way to overcome the inherent data sparseness of proximity data.

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Hofmann, Reimar, Tresp, Volker

Nonlinear Markov Networks for Continuous Variables

We address the problem oflearning structure in nonlinear Markov networks with continuous variables. This can be viewed as non-Gaussian multidimensional density estimation exploiting certain conditional independencies in the variables. Markov networks are a graphical way of describing conditional independencies well suited to model relationships which do not exhibit a natural causal ordering. We use neural network structures to model the quantitative relationships between variables. The main focus in this paper will be on learning the structure for the purpose of gaining insight into the underlying process. Using two data sets we show that interesting structures can be found using our approach. Inference will be briefly addressed.

boston housing data, markov boundary, markov network, (12 more...)

Country:

Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Asia > Japan (0.04)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Held, Marcus, Buhmann, Joachim M.

Unsupervised On-line Learning of Decision Trees for Hierarchical Data Analysis

An adaptive online algorithm is proposed to estimate hierarchical data structures for non-stationary data sources. The approach is based on the principle of minimum cross entropy to derive a decision tree for data clustering and it employs a metalearning idea (learning to learn) to adapt to changes in data characteristics. Its efficiency is demonstrated by grouping non-stationary artifical data and by hierarchical segmentation of LANDSAT images. 1 Introduction Unsupervised learning addresses the problem to detect structure inherent in unlabeled and unclassified data. N. The encoding usually is represented by an assignment matrix M (Mia), where Mia 1 if and only if Xi belongs to cluster L: 1 MiaV (Xi, Ya) measures the quality of a data partition, Le., optimal assignments and prototypes (M,y)OPt argminM,y1i (M,Y) minimize the inhomogeneity of clusters w.r.t. a given distance measure V. For reasons of simplicity we restrict the presentation to the ' sum-of-squared-error criterion V(x, y) To facilitate this minimization a deterministic annealing approach was proposed in [5] signments, which maps the discrete optimization problem, i.e. how to determine the data as via the Maximum Entropy Principle [2] to a continuous parameter es- Unsupervised Online Learning of Decision Trees for Data Analysis 515 timation problem.

data source, learning, prototype, (13 more...)

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)

Genre: Instructional Material > Online (0.41)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.35)

Goldberg, Paul W., Williams, Christopher K. I., Bishop, Christopher M.

Regression with Input-dependent Noise: A Gaussian Process Treatment

Gaussian processes provide natural nonparametric prior distributions over regression functions. In this paper we consider regression problems where there is noise on the output, and the variance of the noise depends on the inputs. If we assume that the noise is a smooth function of the inputs, then it is natural to model the noise variance using a second Gaussian process, in addition to the Gaussian process governing the noise-free output value. We show that prior uncertainty about the parameters controlling both processes can be handled and that the posterior distribution of the noise rate can be sampled from using Markov chain Monte Carlo methods. Our results on a synthetic data set give a posterior noise variance that well-approximates the true variance.

gaussian process, input-dependent noise, noise level, (12 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
(2 more...)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Feraud, Raphaël, Bernier, Olivier

Ensemble and Modular Approaches for Face Detection: A Comparison

A new learning model based on autoassociative neural networks is developped and applied to face detection. To extend the detection ability in orientation and to decrease the number of false alarms, different combinations of networks are tested: ensemble, conditional ensemble and conditional mixture of networks. The use of a conditional mixture of networks allows to obtain state of the art results on different benchmark face databases. The set of all possible windows is E V uN, with V n N 0. Since collecting a representative set of non-face examples is impossible, face detection by a statistical model is a difficult task. An autoassociative network, using five layers of neurons, is able to perform a nonlinear dimensionnality reduction [Kramer, 1991].

conditional mixture, ensemble, face detector, (11 more...)

Country:

Europe > France (0.05)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Frey, Brendan J., MacKay, David J. C.

A Revolution: Belief Propagation in Graphs with Cycles

Until recently, artificial intelligence researchers have frowned upon the application of probability propagation in Bayesian belief networks that have cycles. The probability propagation algorithm is only exact in networks that are cycle-free. However, it has recently been discovered that the two best error-correcting decoding algorithms are actually performing probability propagation in belief networks with cycles. 1 Communicating over a noisy channel Our increasingly wired world demands efficient methods for communicating bits of information over physical channels that introduce errors. Examples of real-world channels include twisted-pair telephone wires, shielded cable-TV wire, fiberoptic cable, deep-space radio, terrestrial radio, and indoor radio. Engineers attempt to correct the errors introduced by the noise in these channels through the use of channel coding which adds protection to the information source, so that some channel errors can be corrected.

bayesian network, information bit, probability propagation, (14 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > Illinois (0.04)
(4 more...)

Industry:

Media > Television (0.54)
Leisure & Entertainment (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Freitas, João F. G. de, Niranjan, Mahesan, Gee, Andrew H.

Regularisation in Sequential Learning Algorithms

In this paper, we discuss regularisation in online/sequential learning algorithms. In environments where data arrives sequentially, techniques such as cross-validation to achieve regularisation or model selection are not possible. Further, bootstrapping to determine a confidence level is not practical. To surmount these problems, a minimum variance estimation approach that makes use of the extended Kalman algorithm for training multi-layer perceptrons is employed. The novel contribution of this paper is to show the theoretical links between extended Kalman filtering, Sutton's variable learning rate algorithms and Mackay's Bayesian estimation framework. In doing so, we propose algorithms to overcome the need for heuristic choices of the initial conditions and noise covariance matrices in the Kalman approach.

algorithm, ekf algorithm, posterior density function, (15 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.06)
North America > United States > California > San Mateo County > San Mateo (0.04)
Africa > South Africa (0.04)

Burger, Matthias, Graepel, Thore, Obermayer, Klaus

An Annealed Self-Organizing Map for Source Channel Coding

It is especially suited for speech and image data which in many applieations have to be transmitted under low bandwidth/high noise level conditions. Following the idea of (Farvardin, 1990) and (Luttrell, 1989) of jointly optimizing the codebook and the data representation w.r.t. to a given channel noise we apply a deterministic annealing scheme (Rose, 1990; Buhmann, 1997) to the problem and develop a An Annealed Self-Organizing Map for Source Channel Coding 431 soft topographic vector quantization algorithm (STVQ) (cf.

self-organizing map, ssom, stvq, (12 more...)

Country:

North America > United States > District of Columbia > Washington (0.04)
Europe > Germany > Berlin (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)