AITopics

A goal of statistical language modeling is to learn the joint probability function of sequences of words. This is intrinsically difficult because of the curse of dimensionality: we propose to fight it with its own weapons. In the proposed approach one learns simultaneously (1) a distributed representation for each word (i.e. a similarity between words) along with (2) the probability function for word sequences, expressed with these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar to words forming an already seen sentence. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach very significantly improves on a state-of-the-art trigram model. 1 Introduction A fundamental problem that makes language modeling and other learning problems difficult is the curse of dimensionality. It is particularly obvious in the case when one wants to model the joint distribution between many discrete random variables (such as words in a sentence, or discrete attributes in a data-mining task).

artificial intelligence, neural network, probability, (18 more...)

Country:

North America > Canada > Quebec (0.14)
North America > United States > Ohio (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.75)

Wainwright, Martin J., Sudderth, Erik B., Willsky, Alan S.

Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles

We present the embedded trees algorithm, an iterative technique for estimation of Gaussian processes defined on arbitrary graphs. By exactly solving a series of modified problems on embedded spanning trees, it computes the conditional means with an efficiency comparable to or better than other techniques. Unlike other methods, the embedded trees algorithm also computes exact error covariances. The error covariance computation is most efficient for graphs in which removing a small number of edges reveals an embedded tree. In this context, we demonstrate that sparse loopy graphs can provide a significant increase in modeling power relative to trees, with only a minor increase in estimation complexity.

artificial intelligence, graph, machine learning, (17 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.71)

Natschläger, Thomas, Maass, Wolfgang, Sontag, Eduardo D., Zador, Anthony M.

Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

Experimental data show that biological synapses behave quite differently from the symbolic synapses in common artificial neural network models. Biological synapses are dynamic, i.e., their "weight" changes on a short time scale by several hundred percent in dependence of the past input to the synapse. In this article we explore the consequences that these synaptic dynamics entail for the computational power of feedforward neural networks. We show that gradient descent suffices to approximate a given (quadratic) filter by a rather small neural system with dynamic synapses. We also compare our network model to artificial neural networks designed for time series processing. Our numerical results are complemented by theoretical analysis which show that even with just a single hidden layer such networks can approximate a surprisingly large large class of nonlinear filters: all filters that can be characterized by Volterra series. This result is robust with regard to various changes in the model for synaptic dynamics.

artificial intelligence, dynamic network, neural network, (16 more...)

Country:

Europe > Austria (0.14)
North America > United States (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Teh, Yee Whye, Hinton, Geoffrey E.

Rate-coded Restricted Boltzmann Machines for Face Recognition

We describe a neurally-inspired, unsupervised learning algorithm that builds a nonlinear generative model for pairs of face images from the same individual. Individuals are then recognized by finding the highest relative probability pair among all pairs that consist of a test image and an image whose identity is known. Our method compares favorably with other methods in the literature. The generative model consists of a single layer of rate-coded, nonlinear feature detectors and it has the property that, given a data vector, the true posterior probability distribution over the feature detector activities can be inferred rapidly without iteration or approximation. The weights of the feature detectors are learned by comparing the correlations of pixel intensities and feature activations in two phases: When the network is observing real data and when it is observing reconstructions of real data generated from the feature activations.

deep learning, neural network, training set, (18 more...)

Country: North America > Canada > Ontario > Toronto (0.29)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)

Lee, Te-Won, Wachtler, Thomas, Sejnowski, Terrence J.

Color Opponency Constitutes a Sparse Representation for the Chromatic Structure of Natural Scenes

The human visual system encodes the chromatic signals conveyed by the three types of retinal cone photoreceptors in an opponent fashion. This color opponency has been shown to constitute an efficient encoding by spectral decorrelation of the receptor signals. We analyze the spatial and chromatic structure of natural scenes by decomposing the spectral images into a set of linear basis functions such that they constitute a representation with minimal redundancy. Independent component analysis finds the basis functions that transforms the spatiochromatic data such that the outputs (activations) are statistically as independent as possible, i.e. least redundant. The resulting basis functions show strong opponency along an achromatic direction (luminance edges), along a blueyellow direction, and along a red-blue direction.

artificial intelligence, basis function, machine learning, (17 more...)

Country: North America > United States > California > San Diego County (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Natschläger, Thomas, Maass, Wolfgang

Finding the Key to a Synapse

Experimental data have shown that synapses are heterogeneous: different synapses respond with different sequences of amplitudes of postsynaptic responses to the same spike train. Neither the role of synaptic dynamics itself nor the role of the heterogeneity of synaptic dynamics for computations in neural circuits is well understood. We present in this article methods that make it feasible to compute for a given synapse with known synaptic parameters the spike train that is optimally fitted to the synapse, for example in the sense that it produces the largest sum of postsynaptic responses. To our surprise we find that most of these optimally fitted spike trains match common firing patterns of specific types of neurons that are discussed in the literature. 1 Introduction A large number of experimental studies have shown that biological synapses have an inherent dynamics, which controls how the pattern of amplitudes of postsynaptic responses depends on the temporal pattern of the incoming spike train. Various quantitative models have been proposed involving a small number of characteristic parameters, that allow us to predict the response of a given synapse to a given spike train once proper values for these characteristic synaptic parameters have been found. The analysis of this article is based on the model of [1], where three parameters U, F, D control the dynamics of a synapse and a fourth parameter A - which corresponds to the synaptic "weight" in static synapse models - scales the absolute sizes of the postsynaptic responses. The resulting model predicts the amplitude Ak for the kth spike in a spike train with interspike intervals (lSI's) .60

health & medicine, optimization problem, spike train, (18 more...)

Country:

Europe > Austria (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area (0.48)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.96)

Arleo, Angelo, Smeraldi, Fabrizio, Hug, Stéphane, Gerstner, Wulfram

Place Cells and Spatial Navigation Based on 2D Visual Feature Extraction, Path Integration, and Reinforcement Learning

Visual input, provided by a video camera on a miniature robot, is preprocessed by a set of Gabor filters on 31 nodes of a log-polar retinotopic graph. Unsupervised Hebbian learning is employed to incrementally build a population of localized overlapping place fields. Place cells serve as basis functions for reinforcement learning. Experimental results for goal-oriented navigation of a mobile robot are presented.

artificial intelligence, reinforcement learning, representation, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Industry: Media > Television (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Tishby, Naftali, Slonim, Noam

Data Clustering by Markovian Relaxation and the Information Bottleneck Method

We introduce a new, nonparametric and principled, distance based clustering method. This method combines a pairwise based approach with a vector-quantization method which provide a meaningful interpretation to the resulting clusters. The idea is based on turning the distance matrix into a Markov process and then examine the decay of mutual-information during the relaxation of this process. The clusters emerge as quasi-stable structures during this relaxation, and then are extracted using the information bottleneck method.

artificial intelligence, health & medicine, information, (19 more...)

Country:

Asia > Middle East > Israel (0.14)
North America > United States > Ohio (0.14)

Industry: Health & Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Bousquet, Olivier, Elisseeff, André

Algorithmic Stability and Generalization Performance

Until recently, most of the research in that area has focused on uniform a-priori bounds giving a guarantee that the difference between the training error and the test error is uniformly small for any hypothesis in a given class. These bounds are usually expressed in terms of combinatorial quantities such as VCdimension. In the last few years, researchers have tried to use more refined quantities to either estimate the complexity of the search space (e.g.

algorithm, artificial intelligence, evolutionary algorithm, (16 more...)

Country: North America > United States > New York (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Tong, Simon, Koller, Daphne

Active Learning for Parameter Estimation in Bayesian Networks

Bayesian networks are graphical representations of probability distributions. In virtually all of the work on learning these networks, the assumption is that we are presented with a data set consisting of randomly generated instances from the underlying distribution. In many situations, however, we also have the option of active learning, where we have the possibility of guiding the sampling process by querying for certain types of samples. This paper addresses the problem of estimating the parameters of Bayesian networks in an active learning setting. We provide a theoretical framework for this problem, and an algorithm that chooses which active learning queries to generate based on the model learned so far. We present experimental results showing that our active learning algorithm can significantly reduce the need for training data in many situations.

active learning, artificial intelligence, bayesian inference, (18 more...)