AITopics

In this article we study the effects of introducing structure in the input distribution of the data to be learnt by a simple perceptron. We determine the learning curves within the framework of Statistical Mechanics. Stepwise generalization occurs as a function of the number of examples when the distribution of patterns is highly anisotropic. Although extremely simple, the model seems to capture the relevant features of a class of Support Vector Machines which was recently shown to present this behavior.

generalization error, subspace, support vector machine, (15 more...)

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
North America > United States > New York (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Ohira, Toru, Sato, Yuzuru, Cowan, Jack D.

Resonance in a Stochastic Neuron Model with Delayed Interaction

Noise-induced coherent oscillations in randomly connected neural networks.

resonance, stochastic neuron model, stochastic resonance, (11 more...)

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
North America > United States > Illinois > Cook County > Chicago (0.05)
Asia > Japan > Honshū > Kantō > Saitama Prefecture > Saitama (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.91)

Nadeau, Claude, Bengio, Yoshua

Inference for the Generalization Error

In order to to compare learning algorithms, experimental results reported in the machine learning litterature often use statistical tests of significance. Unfortunately, most of these tests do not take into account the variability due to the choice of training set. We perform a theoretical investigation of the variance of the cross-validation estimate of the generalization error that takes into account the variability due to the choice of training sets. This allows us to propose two new ways to estimate this variance. We show, via simulations, that these new statistics perform well relative to the statistics considered by Dietterich (Dietterich, 1998). 1 Introduction When applying a learning algorithm (or comparing several algorithms), one is typically interested in estimating its generalization error. Its point estimation is rather trivial through cross-validation.

algorithm, dietterich, variance, (13 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Quebec > Montreal (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Mansour, Yishay, McAllester, David A.

Boosting with Multi-Way Branching in Decision Trees

It is known that decision tree learning can be viewed as a form of boosting. However, existing boosting theorems for decision tree learning allow only binary-branching trees and the generalization to multi-branching trees is not immediate. Practical decision tree algorithms, such as CART and C4.5, implement a tradeoff between the number of branches and the improvement in tree quality as measured by an index function. Here we give a boosting justification for a particular quantitative tradeoff curve. Our main theorem states, in essence, that if we require an improvement proportional to the log of the number of branches then top-down greedy construction of decision trees remains an effective boosting algorithm.

algorithm, hypothesis, procedure, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Neural Computation with Winner-Take-All as the Only Nonlinear Operation

Maass, Wolfgang

Everybody "knows" that neural networks need more than a single layer of nonlinear units to compute interesting functions. We show that this is false if one employs winner-take-all as nonlinear unit: - Any boolean function can be computed by a single k-winner-takeall unit applied to weighted sums of the input variables.

computational power, neural computation, threshold gate, (14 more...)

Country:

Europe > Austria > Styria > Graz (0.06)
North America > United States (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Li, Song, Wong, K. Y. Michael

Statistical Dynamics of Batch Learning

An important issue in neural computing concerns the description of learning dynamics with macroscopic dynamical variables. Recent progress on online learning only addresses the often unrealistic case of an infinite training set. We introduce a new framework to model batch learning of restricted sets of examples, widely applicable to any learning cost function, and fully taking into account the temporal correlations introduced by the recycling of the examples. For illustration we analyze the effects of weight decay and early stopping during the learning of teacher-generated examples.

activation, early stopping, weight decay, (12 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore (0.04)
Asia > China > Hong Kong > Kowloon (0.04)

Industry: Education (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Li, Jonathan Q., Barron, Andrew R.

Mixture Density Estimation

Gaussian mixtures (or so-called radial basis function networks) for density estimation provide a natural counterpart to sigmoidal neural networks for function fitting and approximation. In both cases, it is possible to give simple expressions for the iterative improvement of performance as components of the network are introduced one at a time. In particular, for mixture density estimation we show that a k-component mixture estimated by maximum likelihood (or by an iterative likelihood improvement that we introduce) achieves log-likelihood within order 1/k of the log-likelihood achievable by any convex combination. Consequences for approximation and estimation using Kullback-Leibler risk are also given. A Minimum Description Length principle selects the optimal number of components k that minimizes the risk bound. 1 Introduction In density estimation, Gaussian mixtures provide flexible-basis representations for densities that can be used to model heterogeneous data in high dimensions. Consider a parametric family G { pe(x), x E X C Rd': fJ E The main theme of the paper is to give approximation and estimation bounds of arbitrary densities by finite mixture densities.

approximation, estimation, procedure, (14 more...)

Country: North America > United States > Connecticut > New Haven County > New Haven (0.05)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.36)

Kabashima, Yoshiyuki, Murayama, Tatsuto, Saad, David, Vicente, Renato

Regular and Irregular Gallager-zype Error-Correcting Codes

The performance of regular and irregular Gallager-type errorcorrecting code is investigated via methods of statistical physics. The transmitted codeword comprises products of the original message bits selected by two randomly-constructed sparse matrices; the number of nonzero row/column elements in these matrices constitutes a family of codes. We show that Shannon's channel capacity may be saturated in equilibrium for many of the regular codes while slightly lower performance is obtained for others which may be of higher practical relevance. Decoding aspects are considered by employing the TAP approach which is identical to the commonly used belief-propagation-based decoding. We show that irregular codes may saturate Shannon's capacity but with improved dynamical properties. 1 Introduction The ever increasing information transmission in the modern world is based on reliably communicating messages through noisy transmission channels; these can be telephone lines, deep space, magnetic storing media etc. Error-correcting codes play a significant role in correcting errors incurred during transmission; this is carried out by encoding the message prior to transmission and decoding the corrupted received code-word for retrieving the original message.

free energy, initial condition, paramagnetic solution, (14 more...)

Country:

Europe > United Kingdom (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.34)

Duffy, Nigel, Helmbold, David P.

Potential Boosters?

Simply changing the potential function allows one to create new algorithms related to AdaBoost. However, these new algorithms are generally not known to have the formal boosting property. This paper examines the question of which potential functions lead to new algorithms that are boosters. The two main results are general sets of conditions on the potential; one set implies that the resulting algorithm is a booster, while the other implies that the algorithm is not. These conditions are applied to previously studied potential functions, such as those used by LogitBoost and Doom II.

algorithm, hypothesis, potential function, (14 more...)

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.32)

Bayesian Averaging is Well-Temperated

Hansen, Lars Kai

Often a learning problem has natural quantitative measure of generalization. If a loss function is defined the natural measure is the generalization error, i.e., the expected loss on a random sample independent of the training set. Generalizability is a key topic of learning theory and much progress has been reported. Analytic results for a broad class of machines can be found in the litterature [8, 12, 9, 10] describing the asymptotic generalization ability of supervised algorithms that are continuously parameterized. Asymptotic bounds on generalization for general machines have been advocated by Vapnik [11]. Generalization results valid for finite training sets can only be obtained for specific learning machines, see e.g.

generalization error, predictive distribution, teacher distribution, (14 more...)

Country:

North America > United States > New York (0.05)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Europe > Denmark > Capital Region > Kongens Lyngby (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.32)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)