AITopics

Following recent results [9, 8] showing the importance of the fatshattering dimension in explaining the beneficial effect of a large margin on generalization performance, the current paper investigates the implications of these results for the case of imbalanced datasets and develops two approaches to setting the threshold. The approaches are incorporated into ThetaBoost, a boosting algorithm for dealing with unequal loss functions. The performance of ThetaBoost and the two approaches are tested experimentally.

algorithm, loss function, threshold, (16 more...)

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.31)

Ikeda, Shiro, Amari, Shun-ichi, Nakahara, Hiroyuki

Convergence of the Wake-Sleep Algorithm

The W-S (Wake-Sleep) algorithm is a simple learning rule for the models with hidden variables. It is shown that this algorithm can be applied to a factor analysis model which is a linear version of the Helmholtz machine. But even for a factor analysis model, the general convergence is not proved theoretically. In this article, we describe the geometrical understanding of the W-S algorithm in contrast with the EM (Expectation Maximization) algorithm and the em algorithm. As the result, we prove the convergence of the W-S algorithm for the factor analysis model. We also show the condition for the convergence in general models.

algorithm, factor analysis model, generative model, (14 more...)

Country: Asia > Japan > Honshū > Kantō > Saitama Prefecture > Saitama (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

Herschkowitz, Didier, Nadal, Jean-Pierre

Unsupervised and Supervised Clustering: The Mutual Information between Parameters and Observations

Recent works in parameter estimation and neural coding have demonstrated that optimal performance are related to the mutual information between parameters and data. We consider the mutual information in the case where the dependency in the parameter (a vector 8) of the conditional p.d.f. of each observation (a vector

calculation, estimator, mutual information, (11 more...)

Country:

Asia > Brunei (0.06)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.36)

Ferrari-Trecate, Giancarlo, Williams, Christopher K. I., Opper, Manfred

Finite-Dimensional Approximation of Gaussian Processes

Gaussian process (GP) prediction suffers from O(n3) scaling with the data set size n. By using a finite-dimensional basis to approximate the GP predictor, the computational complexity can be reduced. We derive optimal finite-dimensional predictors under a number of assumptions, and show the superiority of these predictors over the Projected Bayes Regression method (which is asymptotically optimal). We also show how to calculate the minimal model size for a given n. The calculations are backed up by numerical experiments.

eigenfunction, gaussian process, predictor, (15 more...)

Country:

Europe > United Kingdom > England > West Midlands > Birmingham (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Düring, A., Coolen, Anthony C. C., Sherrington, D.

Phase Diagram and Storage Capacity of Sequence-Storing Neural Networks

We solve the dynamics of Hopfield-type neural networks which store sequences of patterns, close to saturation. The asymmetry of the interaction matrix in such models leads to violation of detailed balance, ruling out an equilibrium statistical mechanical analysis. Using generating functional methods we derive exact closed equations for dynamical order parameters, viz. the sequence overlap and correlation and response functions.

equation, neuron, phase diagram and storage capacity, (11 more...)

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)

Cristianini, Nello, Campbell, Colin, Shawe-Taylor, John

Dynamically Adapting Kernels in Support Vector Machines

The kernel-parameter is one of the few tunable parameters in Support Vector machines, controlling the complexity of the resulting hypothesis. Its choice amounts to model selection and its value is usually found by means of a validation set. We present an algorithm which can automatically perform model selection with little additional computational cost and with no need of a validation set. In this procedure model selection and learning are not separate, but kernels are dynamically adjusted during the learning process to find the kernel parameter which provides the best possible upper bound on the generalisation error. Theoretical results motivating the approach and experimental results confirming its validity are presented.

generalisation error, kernel parameter, support vector machine, (13 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Wisconsin (0.05)
Europe > United Kingdom > England > Bristol (0.05)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Coolen, Anthony C. C., Saad, David

Dynamics of Supervised Learning with Restricted Training Sets

We study the dynamics of supervised learning in layered neural networks, in the regime where the size p of the training set is proportional to the number N of inputs. Here the local fields are no longer described by Gaussian distributions.

equation, order parameter, supervised learning, (11 more...)

Country:

Europe > United Kingdom (0.04)
Asia > Singapore (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Barber, David, Wiegerinck, Wim

Tractable Variational Structures for Approximating Graphical Models

Graphical models provide a broad probabilistic framework with applications in speech recognition (Hidden Markov Models), medical diagnosis (Belief networks) and artificial intelligence (Boltzmann Machines). However, the computing time is typically exponential in the number of nodes in the graph. Within the variational framework for approximating these models, we present two classes of distributions, decimatable Boltzmann Machines and Tractable Belief Networks that go beyond the standard factorized approach. We give generalised mean-field equations for both these directed and undirected approximations. Simulation results on a small benchmark problem suggest using these richer approximations compares favorably against others previously reported in the literature. 1 Introduction Graphical models provide a powerful framework for probabilistic inference[l] but suffer intractability when applied to large scale problems.

approximation, belief network, tractable variational structure, (16 more...)

Country:

Asia > Middle East > Jordan (0.06)
Europe > Netherlands > Gelderland > Nijmegen (0.05)
North America > United States > Massachusetts (0.04)

Industry: Health & Medicine (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Zemel, Richard S., Dayan, Peter

Distributional Population Codes and Multiple Motion Models

Thresholds for movement direction: two directions are less detectable than one.

distributional population code, population response, stimuli, (13 more...)

Country:

North America > United States > Arizona (0.05)
South America > Venezuela > Capital District > Caracas (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.70)

Technology: Information Technology > Artificial Intelligence > Cognitive Science (0.30)

Stemmler, Martin, Koch, Christof

Information Maximization in Single Neurons

Information from the senses must be compressed into the limited range of firing rates generated by spiking nerve cells. Optimal compression uses all firing rates equally often, implying that the nerve cell's response matches the statistics of naturally occurring stimuli. Since changing the voltage-dependent ionic conductances in the cell membrane alters the flow of information, an unsupervised, non-Hebbian, developmental learning rule is derived to adapt the conductances in Hodgkin-Huxley model neurons. By maximizing the rate of information transmission, each firing rate within the model neuron's limited dynamic range is used equally often. An efficient neuronal representation of incoming sensory information should take advantage of the regularity and scale invariance of stimulus features in the natural world. In the case of vision, this regularity is reflected in the typical probabilities of encountering particular visual contrasts, spatial orientations, or colors [1]. Given these probabilities, an optimized neural code would eliminate any redundancy, while devoting increased representation to commonly encountered features. At the level of a single spiking neuron, information about a potentially large range of stimuli is compressed into a finite range of firing rates, since the maximum firing rate of a neuron is limited. Optimizing the information transmission through a single neuron in the presence of uniform, additive noise has an intuitive interpretation: the most efficient representation of the input uses every firing rate with equal probability.

conductance, firing rate, neuron, (14 more...)

Country:

North America > United States > New York (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.92)