AITopics

Arguably one of the most important types of information processing is the capacity for probabilistic reasoning. The properties of undirectedproDabilistic models represented as symmetric networks have been studied extensively using methods from statistical mechanics (Hertz et aI, 1991). Detailed analyses of these models are possible by exploiting averaging phenomena that occur in the thermodynamic limit of large networks. In this paper, we analyze the limit of large, multilayer networks for probabilistic models represented as directed acyclic graphs. These models are known as Bayesian networks (Pearl, 1988; Neal, 1992), and they have different probabilistic semantics than symmetric neural networks (such as Hopfield models or Boltzmann machines). We show that the intractability of exact inference in multilayer Bayesian networks Inference in Multilayer Networks via Large Deviation Bounds 261 does not preclude their effective use. Our work builds on earlier studies of variational methods (Jordan et aI, 1997).

marginal probability, node, probability, (14 more...)

Country:

Asia > Middle East > Jordan (0.25)
North America > United States > California > San Mateo County > San Mateo (0.04)
North America > United States > California > San Mateo County > Redwood City (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Ikeda, Shiro, Amari, Shun-ichi, Nakahara, Hiroyuki

Convergence of the Wake-Sleep Algorithm

The W-S (Wake-Sleep) algorithm is a simple learning rule for the models with hidden variables. It is shown that this algorithm can be applied to a factor analysis model which is a linear version of the Helmholtz machine. But even for a factor analysis model, the general convergence is not proved theoretically. In this article, we describe the geometrical understanding of the W-S algorithm in contrast with the EM (Expectation Maximization) algorithm and the em algorithm. As the result, we prove the convergence of the W-S algorithm for the factor analysis model. We also show the condition for the convergence in general models.

algorithm, factor analysis model, generative model, (14 more...)

Country: Asia > Japan > Honshū > Kantō > Saitama Prefecture > Saitama (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

Herschkowitz, Didier, Nadal, Jean-Pierre

Unsupervised and Supervised Clustering: The Mutual Information between Parameters and Observations

Recent works in parameter estimation and neural coding have demonstrated that optimal performance are related to the mutual information between parameters and data. We consider the mutual information in the case where the dependency in the parameter (a vector 8) of the conditional p.d.f. of each observation (a vector

calculation, estimator, mutual information, (11 more...)

Country:

Asia > Brunei (0.06)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.36)

Ferrari-Trecate, Giancarlo, Williams, Christopher K. I., Opper, Manfred

Finite-Dimensional Approximation of Gaussian Processes

Gaussian process (GP) prediction suffers from O(n3) scaling with the data set size n. By using a finite-dimensional basis to approximate the GP predictor, the computational complexity can be reduced. We derive optimal finite-dimensional predictors under a number of assumptions, and show the superiority of these predictors over the Projected Bayes Regression method (which is asymptotically optimal). We also show how to calculate the minimal model size for a given n. The calculations are backed up by numerical experiments.

eigenfunction, gaussian process, predictor, (15 more...)

Country:

Europe > United Kingdom > England > West Midlands > Birmingham (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Coolen, Anthony C. C., Saad, David

Dynamics of Supervised Learning with Restricted Training Sets

We study the dynamics of supervised learning in layered neural networks, in the regime where the size p of the training set is proportional to the number N of inputs. Here the local fields are no longer described by Gaussian distributions.

equation, order parameter, supervised learning, (11 more...)

Country:

Europe > United Kingdom (0.04)
Asia > Singapore (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Barber, David, Wiegerinck, Wim

Tractable Variational Structures for Approximating Graphical Models

Graphical models provide a broad probabilistic framework with applications in speech recognition (Hidden Markov Models), medical diagnosis (Belief networks) and artificial intelligence (Boltzmann Machines). However, the computing time is typically exponential in the number of nodes in the graph. Within the variational framework for approximating these models, we present two classes of distributions, decimatable Boltzmann Machines and Tractable Belief Networks that go beyond the standard factorized approach. We give generalised mean-field equations for both these directed and undirected approximations. Simulation results on a small benchmark problem suggest using these richer approximations compares favorably against others previously reported in the literature. 1 Introduction Graphical models provide a powerful framework for probabilistic inference[l] but suffer intractability when applied to large scale problems.

approximation, belief network, tractable variational structure, (16 more...)

Country:

Asia > Middle East > Jordan (0.06)
Europe > Netherlands > Gelderland > Nijmegen (0.05)
North America > United States > Massachusetts (0.04)

Industry: Health & Medicine (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Rinberg, Dmitry, Davidowitz, Hanan, Tishby, Naftali

Multi-Electrode Spike Sorting by Clustering Transfer Functions

Since every electrode is in a different position it will measure a different contribution from each of the different neurons. Simply stated, the problem is this: how can these complex signals be untangled to determine when each individual cell fired? This problem is difficult because, a) the objects being classified are very similar and often noisy, b) spikes coming from the same cell can ·Permanent address: Institute of Computer Science and Center for Neural Computation, The Hebrew University, Jerusalem, Israel.

electrode, spike, transfer function ratio, (15 more...)

Country:

Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.24)
North America > United States > New York (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.48)

Technology: Information Technology > Artificial Intelligence (0.90)

Manwani, Amit, Koch, Christof

Signal Detection in Noisy Weakly-Active Dendrites

Here we derive measures quantifying the information loss of a synaptic signal due to the presence of neuronal noise sources, as it electrotonically propagates along a weakly-active dendrite. We model the dendrite as an infinite linear cable, with noise sources distributed along its length. The noise sources we consider are thermal noise, channel noise arising from the stochastic nature of voltage-dependent ionic channels (K and Na) and synaptic noise due to spontaneous background activity. We assess the efficacy of information transfer using a signal detection paradigm where the objective is to detect the presence/absence of a presynaptic spike from the post-synaptic membrane voltage. This allows us to analytically assess the role of each of these noise sources in information transfer. For our choice of parameters, we find that the synaptic noise is the dominant noise source which limits the maximum length over which information be reliably transmitted. 1 Introduction This is a continuation of our efforts (Manwani and Koch, 1998) to understand the information capacity ofa neuronal link (in terms of the specific nature of neural "hardware") by a systematic study of information processing at different biophysical stages in a model of a single neuron. Here we investigate how the presence of neuronal noise sources influences the information transmission capabilities of a simplified model of a weakly-active dendrite. The noise sources we include are, thermal noise, channel noise arising from the stochastic nature of voltage-dependent channels (K and Na) and synaptic noise due to spontaneous background activity. We characterize the noise sources using analytical expressions of their current power spectral densities and compare their magnitudes for dendritic parameters reported in literature (Mainen and Sejnowski, 1998).

dendrite, noise source, signal detection, (13 more...)

Country:

North America > United States > New York (0.05)
North America > United States > California > Los Angeles County > Pasadena (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Gat, Itay, Tishby, Naftali

Synergy and Redundancy among Brain Cells of Behaving Monkeys

Determining the relationship between the activity of a single nerve cell to that of an entire population is a fundamental question that bears on the basic neural computation paradigms. In this paper we apply an information theoretic approach to quantify the level of cooperative activity among cells in a behavioral context. It is possible to discriminate between synergetic activity of the cells vs. redundant activity, depending on the difference between the information they provide when measured jointly and the information they provide independently. We define a synergy value that is positive in the first case and negative in the second and show that the synergy value can be measured by detecting the behavioral mode of the animal from simultaneously recorded activity of the cells. We observe that among cortical cells positive synergy can be found, while cells from the basal ganglia, active during the same task, do not exhibit similar synergetic activity.

information, mutual information, synergy value, (13 more...)