AITopics

Country: North America > United States (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Hochreiter, Sepp, Mozer, Michael C.

Beyond Maximum Likelihood and Density Estimation: A Sample-Based Criterion for Unsupervised Learning of Complex Models

Two well known classes of unsupervised procedures that can be cast in this manner are generative and recoding models. In a generative unsupervised framework, the environment generates training exampleswhich we will refer to as observations-by sampling from one distribution; the other distribution is embodied in the model. Examples of generative frameworks are mixtures of Gaussians (MoG) [2], factor analysis [4], and Boltzmann machines [8]. In the recoding unsupervised framework, the model transforms points from an obser- vation space to an output space, and the output distribution is compared either to a reference distribution or to a distribution derived from the output distribution. An example is independent component analysis (leA) [11], a method that discovers a representation of vector-valued observations in which the statistical dependence among the vector elements in the output space is minimized.

artificial intelligence, bayesian inference, particle, (17 more...)

Country:

Europe > United Kingdom (0.28)
North America > United States > Colorado > Boulder County > Boulder (0.14)
North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.41)

Schwartz, Odelia, Simoncelli, Eero P.

Natural Sound Statistics and Divisive Normalization in the Auditory System

We explore the statistical properties of natural sound stimuli preprocessed with a bank of linear filters. The responses of such filters exhibit a striking form of statistical dependency, in which the response variance of each filter grows with the response amplitude of filters tuned for nearby frequencies. These dependencies may be substantially reduced using an operation known as divisive normalization, in which the response of each filter is divided by a weighted sum of the rectified responses of other filters. The weights may be chosen to maximize the independence of the normalized responses for an ensemble of natural sounds. We demonstrate that the resulting model accounts for nonlinearities in the response characteristics of the auditory nerve, by comparing model simulations to electrophysiological recordings.

dependency, health & medicine, neurology, (19 more...)

Country: North America > United States (0.15)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Koltchinskii, Vladimir, Panchenko, Dmitriy, Lozano, Fernando

Some New Bounds on the Generalization Error of Combined Classifiers

In this paper we develop the method of bounding the generalization error of a classifier in terms of its margin distribution which was introduced in the recent papers of Bartlett and Schapire, Freund, Bartlett and Lee. The theory of Gaussian and empirical processes allow us to prove the margin type inequalities for the most general functional classes, the complexity of the class being measured via the so called Gaussian complexity functions. As a simple application of our results, we obtain the bounds of Schapire, Freund, Bartlett and Lee for the generalization error of boosting. We also substantially improve the results of Bartlett on bounding the generalization error of neural networks in terms of h -norms of the weights of neurons. Furthermore, under additional assumptions on the complexity of the class of hypotheses we provide some tighter bounds, which in the case of boosting improve the results of Schapire, Freund, Bartlett and Lee.

artificial intelligence, evolutionary algorithm, generalization error, (17 more...)

Country: North America > United States > New Mexico (0.16)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Automatic Choice of Dimensionality for PCA

Minka, Thomas P.

A central issue in principal component analysis (PCA) is choosing the number of principal components to be retained. By interpreting PCA as density estimation, we show how to use Bayesian model selection to estimate the true dimensionality of the data. The resulting estimate is simple to compute yet guaranteed to pick the correct dimensionality, given enough data. The estimate involves an integral over the Steifel manifold of k-frames, which is difficult to compute exactly. But after choosing an appropriate parameterization and applying Laplace's method, an accurate and practical estimator is obtained. In simulations, it is convincingly better than cross-validation and other proposed algorithms, plus it runs much faster.

artificial intelligence, bayesian inference, dimensionality, (18 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)

Tenenbaum, Joshua B., Griffiths, Thomas L.

Structure Learning in Human Causal Induction

We use graphical models to explore the question of how people learn simple causal relationships from data. The two leading psychological theories can both be seen as estimating the parameters of a fixed graph. We argue that a complete account of causal induction should also consider how people learn the underlying causal graph structure, and we propose to model this inductive process as a Bayesian inference. Our argument is supported through the discussion of three data sets. 1 Introduction Causality plays a central role in human mental life. Our behavior depends upon our understanding of the causal structure of our environment, and we are remarkably good at inferring causation from mere observation. Constructing formal models of causal induction is currently a major focus of attention in computer science [7], psychology [3,6], and philosophy [5]. This paper attempts to connect these literatures, by framing the debate between two major psychological theories in the computational language of graphical models. We show that existing theories equate human causal induction with maximum likelihood parameter estimation on a fixed graphical structure, and we argue that to fully account for human behavioral data, we must also postulate that people make Bayesian inferences about the underlying causal graph structure itself.

Szummer, Martin, Jaakkola, Tommi

Kernel Expansions with Unlabeled Examples

Modern classification applications necessitate supplementing the few available labeled examples with unlabeled examples to improve classification performance. We present a new tractable algorithm for exploiting unlabeled examples in discriminative classification. This is achieved essentially by expanding the input vectors into longer feature vectors via both labeled and unlabeled examples. The resulting classification method can be interpreted as a discriminative kernel density estimate and is readily trained via the EM algorithm, which in this case is both discriminative and achieves the optimal solution. We provide, in addition, a purely discriminative formulation of the estimation problem by appealing to the maximum entropy framework. We demonstrate that the proposed approach requires very few labeled examples for high classification accuracy.

artificial intelligence, formulation, machine learning, (16 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Competition and Arbors in Ocular Dominance

Dayan, Peter

Hebbian and competitive Hebbian algorithms are almost ubiquitous in modeling pattern formation in cortical development. We analyse in theoretical detail a particular model (adapted from Piepenbrock & Obermayer, 1999) for the development of Id stripe-like patterns, which places competitive and interactive cortical influences, and free and restricted initial arborisation onto a common footing. 1 Introduction Cats, many species of monkeys, and humans exibit ocular dominance stripes, which are alternating areas of primary visual cortex devoted to input from (the thalamic relay associated with) just one or the other eye (see Erwin et aI, 1995; Miller, 1996; Swindale, 1996 for reviews of theory and data). These well-known fingerprint patterns have been a seductive target for models of cortical pattern formation because of the mix of competition and cooperation they suggest. A wealth of synaptic adaptation algorithms has been suggested to account for them (and also the concomitant refinement of the topography of the map between the eyes and the cortex), many of which are based on forms of Hebbian learning. Critical issues for the models are the degree of correlation between inputs from the eyes, the nature of the initial arborisation of the axonal inputs, the degree and form of cortical competition, and the nature of synaptic saturation (preventing weights from changing sign or getting too large) and normalisation (allowing cortical and/or thalamic cells to support only a certain total synaptic weight).

eigenvalue, health & medicine, neurology, (16 more...)

Country:

North America > United States (0.14)
Europe > United Kingdom > England (0.14)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Dellaert, Frank, Seitz, Steven M., Thrun, Sebastian, Thorpe, Charles E.

Feature Correspondence: A Markov Chain Monte Carlo Approach

When trying to recover 3D structure from a set of images, the most difficult problem is establishing the correspondence between the measurements. Most existing approaches assume that features can be tracked across frames, whereas methods that exploit rigidity constraints to facilitate matching do so only under restricted camera motion. In this paper we propose a Bayesian approach that avoids the brittleness associated with singling out one "best" correspondence, and instead consider the distribution over all possible correspondences. We treat both a fully Bayesian approach that yields a posterior distribution, and a MAP approach that makes use of EM to maximize this posterior. We show how Markov chain Monte Carlo methods can be used to implement these techniques in practice, and present experimental results on real data.

Saon, George, Padmanabhan, Mukund

Minimum Bayes Error Feature Selection for Continuous Speech Recognition

Two avenues will be explored: the first is to maximize the ()-average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the range of (). While both approaches yield similar performance in practice, they outperform standard LDA features and show a 10% relative improvement in the word error rate over state-of-the-art cepstral features on a large vocabulary telephony speech recognition task. 1 Introduction Modern speech recognition systems use cepstral features characterizing the short-term spectrum of the speech signal for classifying frames into phonetic classes. These features are augmented with dynamic information from the adjacent frames to capture transient spectral events in the signal.

artificial intelligence, divergence, speech recognition, (13 more...)