AITopics

Everybody "knows" that neural networks need more than a single layer of nonlinear units to compute interesting functions. We show that this is false if one employs winner-take-all as nonlinear unit: - Any boolean function can be computed by a single k-winner-takeall unit applied to weighted sums of the input variables.

computational power, neural computation, threshold gate, (14 more...)

Country:

Europe > Austria > Styria > Graz (0.06)
North America > United States (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Li, Song, Wong, K. Y. Michael

Statistical Dynamics of Batch Learning

An important issue in neural computing concerns the description of learning dynamics with macroscopic dynamical variables. Recent progress on online learning only addresses the often unrealistic case of an infinite training set. We introduce a new framework to model batch learning of restricted sets of examples, widely applicable to any learning cost function, and fully taking into account the temporal correlations introduced by the recycling of the examples. For illustration we analyze the effects of weight decay and early stopping during the learning of teacher-generated examples.

activation, early stopping, weight decay, (12 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore (0.04)
Asia > China > Hong Kong > Kowloon (0.04)

Industry: Education (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Li, Jonathan Q., Barron, Andrew R.

Mixture Density Estimation

Gaussian mixtures (or so-called radial basis function networks) for density estimation provide a natural counterpart to sigmoidal neural networks for function fitting and approximation. In both cases, it is possible to give simple expressions for the iterative improvement of performance as components of the network are introduced one at a time. In particular, for mixture density estimation we show that a k-component mixture estimated by maximum likelihood (or by an iterative likelihood improvement that we introduce) achieves log-likelihood within order 1/k of the log-likelihood achievable by any convex combination. Consequences for approximation and estimation using Kullback-Leibler risk are also given. A Minimum Description Length principle selects the optimal number of components k that minimizes the risk bound. 1 Introduction In density estimation, Gaussian mixtures provide flexible-basis representations for densities that can be used to model heterogeneous data in high dimensions. Consider a parametric family G { pe(x), x E X C Rd': fJ E The main theme of the paper is to give approximation and estimation bounds of arbitrary densities by finite mixture densities.

approximation, estimation, procedure, (14 more...)

Country: North America > United States > Connecticut > New Haven County > New Haven (0.05)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.36)

Kabashima, Yoshiyuki, Murayama, Tatsuto, Saad, David, Vicente, Renato

Regular and Irregular Gallager-zype Error-Correcting Codes

The performance of regular and irregular Gallager-type errorcorrecting code is investigated via methods of statistical physics. The transmitted codeword comprises products of the original message bits selected by two randomly-constructed sparse matrices; the number of nonzero row/column elements in these matrices constitutes a family of codes. We show that Shannon's channel capacity may be saturated in equilibrium for many of the regular codes while slightly lower performance is obtained for others which may be of higher practical relevance. Decoding aspects are considered by employing the TAP approach which is identical to the commonly used belief-propagation-based decoding. We show that irregular codes may saturate Shannon's capacity but with improved dynamical properties. 1 Introduction The ever increasing information transmission in the modern world is based on reliably communicating messages through noisy transmission channels; these can be telephone lines, deep space, magnetic storing media etc. Error-correcting codes play a significant role in correcting errors incurred during transmission; this is carried out by encoding the message prior to transmission and decoding the corrupted received code-word for retrieving the original message.

free energy, initial condition, paramagnetic solution, (14 more...)

Country:

Europe > United Kingdom (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.34)

Duffy, Nigel, Helmbold, David P.

Potential Boosters?

Simply changing the potential function allows one to create new algorithms related to AdaBoost. However, these new algorithms are generally not known to have the formal boosting property. This paper examines the question of which potential functions lead to new algorithms that are boosters. The two main results are general sets of conditions on the potential; one set implies that the resulting algorithm is a booster, while the other implies that the algorithm is not. These conditions are applied to previously studied potential functions, such as those used by LogitBoost and Doom II.

algorithm, hypothesis, potential function, (14 more...)

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.32)

Bayesian Averaging is Well-Temperated

Hansen, Lars Kai

Often a learning problem has natural quantitative measure of generalization. If a loss function is defined the natural measure is the generalization error, i.e., the expected loss on a random sample independent of the training set. Generalizability is a key topic of learning theory and much progress has been reported. Analytic results for a broad class of machines can be found in the litterature [8, 12, 9, 10] describing the asymptotic generalization ability of supervised algorithms that are continuously parameterized. Asymptotic bounds on generalization for general machines have been advocated by Vapnik [11]. Generalization results valid for finite training sets can only be obtained for specific learning machines, see e.g.

generalization error, predictive distribution, teacher distribution, (14 more...)

Country:

North America > United States > New York (0.05)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Europe > Denmark > Capital Region > Kongens Lyngby (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.32)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Csató, Lehel, Fokoué, Ernest, Opper, Manfred, Schottky, Bernhard, Winther, Ole

Efficient Approaches to Gaussian Process Classification

The first two methods are related to mean field ideas known in Statistical Physics. The third approach is based on Bayesian online approach which was motivated by recent results in the Statistical Mechanics of Neural Networks. We present simulation results showing: 1. that the mean field Bayesian evidence may be used for hyperparameter tuning and 2. that the online approach may achieve a low training error fast. 1 Introduction Gaussian processes provide promising nonparametric Bayesian approaches to regression and classification [2, 1].

approximation, free energy, likelihood, (14 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
Europe > Sweden > Skåne County > Lund (0.04)

Genre: Research Report (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Crisp, David J., Burges, Christopher J. C.

A Geometric Interpretation of v-SVM Classifiers

We show that the recently proposed variant of the Support Vector machine (SVM) algorithm, known as v-SVM, can be interpreted as a maximal separation between subsets of the convex hulls of the data, which we call soft convex hulls. The soft convex hulls are controlled by choice of the parameter v. If the intersection of the convex hulls is empty, the hyperplane is positioned halfway between them such that the distance between convex hulls, measured along the normal, is maximized; and if it is not, the hyperplane's normal is similarly determined by the soft convex hulls, but its position (perpendicular distance from the origin) is adjusted to minimize the error sum. The proposed geometric interpretation of v-SVM also leads to necessary and sufficient conditions for the existence of a choice of v for which the v-SVM solution is nontrivial. 1 Introduction Recently, SchOlkopf et al. [I) introduced a new class of SVM algorithms, called v-SVM, for both regression estimation and pattern recognition. The basic idea is to remove the user-chosen error penalty factor C that appears in SVM algorithms by introducing a new variable p which, in the pattern recognition case, adds another degree of freedom to the margin.

convex hull, formulation, geometric interpretation, (11 more...)

Country:

Oceania > Australia > South Australia > Adelaide (0.04)
North America > United States > New Jersey (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Chapelle, Olivier, Vapnik, Vladimir

Model Selection for Support Vector Machines

New functionals for parameter (model) selection of Support Vector Machines are introduced based on the concepts of the span of support vectors and rescaling of the feature space. It is shown that using these functionals, one can both predict the best choice of parameters of the model and the relative quality of performance for any value of parameter.

leave-one-out procedure, support vector, vector, (13 more...)

Country:

North America > United States > New York (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Burges, Christopher J. C., Crisp, David J.

Uniqueness of the SVM Solution

We give necessary and sufficient conditions for uniqueness of the support vector solution for the problems of pattern recognition and regression estimation, for a general class of cost functions. We show that if the solution is not unique, all support vectors are necessarily at bound, and we give some simple examples of non-unique solutions. We note that uniqueness of the primal (dual) solution does not necessarily imply uniqueness of the dual (primal) solution. We show how to compute the threshold b when the solution is unique, but when all support vectors are at bound, in which case the usual method for determining b does not work. 1 Introduction Support vector machines (SVMs) have attracted wide interest as a means to implement structural risk minimization for the problems of classification and regression estimation. The fact that training an SVM amounts to solving a convex quadratic programming problem means that the solution found is global, and that if it is not unique, then the set of global solutions is itself convex; furthermore, if the objective function is strictly convex, the solution is guaranteed to be unique [1]1.

convex, objective function, support vector, (13 more...)

Country:

Oceania > Australia > South Australia > Adelaide (0.04)
North America > United States > Wisconsin (0.04)
North America > United States > New York (0.04)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)