Goto

Collaborating Authors

 Statistical Learning


Leaning by Combining Memorization and Gradient Descent

Neural Information Processing Systems

We have created a radial basis function network that allocates a new computational unit whenever an unusual pattern is presented to the network. The network learns by allocating new units and adjusting the parameters of existing units. If the network performs poorly on a presented pattern, then a new unit is allocated which memorizes the response to the presented pattern. If the network performs well on a presented pattern, then the network parameters are updated using standard LMS gradient descent. For predicting the Mackey Glass chaotic time series, our network learns much faster than do those using back-propagation and uses a comparable number of synapses.


Generalization Properties of Radial Basis Functions

Neural Information Processing Systems

Sherif M. Botros Christopher G. Atkeson Brain and Cognitive Sciences Department and the Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139 Abstract We examine the ability of radial basis functions (RBFs) to generalize. We compare the performance of several types of RBFs. We use the inverse dynamics of an idealized two-joint arm as a test case. We find that without a proper choice of a norm for the inputs, RBFs have poor generalization properties. A simple global scaling of the input variables greatly improves performance.


Speech Recognition Using Connectionist Approaches

Neural Information Processing Systems

This paper is a summary of SPRINT project aims and results. The project focus on the use of neuro-computing techniques to tackle various problems that remain unsolved in speech recognition. First results concern the use of feedforward nets for phonetic units classification, isolated word recognition, and speaker adaptation.


Dynamics of Learning in Recurrent Feature-Discovery Networks

Neural Information Processing Systems

The self-organization of recurrent feature-discovery networks is studied from the perspective of dynamical systems. Bifurcation theory reveals parameter regimesin which multiple equilibria or limit cycles coexist with the equilibrium at which the networks perform principal component analysis.


Generalization Properties of Radial Basis Functions

Neural Information Processing Systems

Atkeson Brain and Cognitive Sciences Department and the Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139 Abstract We examine the ability of radial basis functions (RBFs) to generalize. We compare the performance of several types of RBFs. We use the inverse dynamics ofan idealized two-joint arm as a test case. We find that without a proper choice of a norm for the inputs, RBFs have poor generalization properties. A simple global scaling of the input variables greatly improves performance.



An Analog VLSI Splining Network

Neural Information Processing Systems

Waltham, MA 02254 Abstract We have produced a VLSI circuit capable of learning to approximate arbitrary smoothof a single variable using a technique closely related to splines. The circuit effectively has 512 knots space on a uniform grid and has full support for learning. The circuit also can be used to approximate multi-variable functions as sum of splines. An interesting, and as of yet, nearly untapped set of applications for VLSI implementation ofneural network learning systems can be found in adaptive control and nonlinear signal processing. In most such applications, the learning task consists of approximating a real function of a small number of continuous variables from discrete data points.



Comparison of three classification techniques: CART, C4.5 and Multi-Layer Perceptrons

Neural Information Processing Systems

In this paper, after some introductory remarks into the classification problem asconsidered in various research communities, and some discussions concerning some of the reasons for ascertaining the performances of the three chosen algorithms, viz., CART (Classification and Regression Tree), C4.5 (one of the more recent versions of a popular induction tree technique knownas ID3), and a multi-layer perceptron (MLP), it is proposed to compare the performances of these algorithms under two criteria: classification andgeneralisation. It is found that, in general, the MLP has better classification and generalisation accuracies compared with the other two algorithms. 1 Introduction Classification of data into categories has been pursued by a number of research communities, viz., applied statistics, knowledge acquisition, neural networks. In applied statistics, there are a number of techniques, e.g., clustering algorithms (see e.g., Hartigan), CART (Classification and Regression Trees, see e.g., Breiman et al). Clustering algorithms are used when the underlying data naturally fall into a number of groups, the distance among groups are measured by various metrics [Hartigan]. CART[Breiman, et all has been very popular among applied statisticians. It assumes that the underlying data can be separated into categories, the decision boundaries can either be parallel to the axis or they can be a linear combination of these axes!.


Asymptotic slowing down of the nearest-neighbor classifier

Neural Information Processing Systems

Santosh S. Venkatesh Electrical Engineering University of Pennsylvania Philadelphia, PA 19104 If patterns are drawn from an n-dimensional feature space according to a probability distribution that obeys a weak smoothness criterion, we show that the probability that a random input pattern is misclassified by a nearest-neighbor classifier using M random reference patterns asymptotically satisfies a PM(error) "" Poo(error) M2/n' for sufficiently large values of M. Here, Poo(error) denotes the probability of error in the infinite sample limit, and is at most twice the error of a Bayes classifier. Although the value of the coefficient a depends upon the underlying probability distributions, the exponent of M is largely distribution free.We thus obtain a concise relation between a classifier's ability to generalize from a finite reference sample and the dimensionality of the feature space, as well as an analytic validation of Bellman's well known "curse of dimensionality." 1 INTRODUCTION One of the primary tasks assigned to neural networks is pattern classification.