Goto

Collaborating Authors

 Technology


Discriminability-Based Transfer between Neural Networks

Neural Information Processing Systems

Neural networks are usually trained from scratch, relying only on the training data for guidance. However, as more and more networks are trained for various tasks, it becomes reasonable to seek out methods that.


Assessing and Improving Neural Network Predictions by the Bootstrap Algorithm

Neural Information Processing Systems

The bootstrap method offers an computation intensive alternative to estimate the predictive distribution for a neural network even if the analytic derivation is intractable. The available asymptotic results show that it is valid for a large number of linear, nonlinear and even nonparametric regression problems. It has the potential to model the distribution of estimators to a higher precision than the usual normal asymptotics. It even may be valid if the normal asymptotics fail. However, the theoretical properties of bootstrap procedures for neural networks - especially nonlinear models - have to be investigated more comprehensively.


Generalization Abilities of Cascade Network Architecture

Neural Information Processing Systems

In [5], a new incremental cascade network architecture has been presented. This paper discusses the properties of such cascade networks and investigates their generalization abilities under the particular constraint of small data sets. The evaluation is done for cascade networks consisting of local linear maps using the Mackey Glass time series prediction task as a benchmark. Our results indicate that to bring the potential of large networks to bear on the problem of ning extracting information from small data sets without run the risk of overjitting, deeply cascaded network architectures are more favorable than shallow broad architectures that contain the same number of nodes. 1 Introduction For many real-world applications, a major constraint for the successful learning from examples is the limited number of examples available. Thus, methods are required, that can learn from small data sets. This constraint makes the problem of generalization particularly hard.


Time Warping Invariant Neural Networks

Neural Information Processing Systems

We proposed a model of Time Warping Invariant Neural Networks (TWINN) to handle the time warped continuous signals. Although TWINN is a simple modification of well known recurrent neural network, analysis has shown that TWINN completely removes time warping and is able to handle difficult classification problem. It is also shown that TWINN has certain advantages over the current available sequential processing schemes: Dynamic Programming(DP)[I], Hidden Markov Model( HMM)[2], Time Delayed Neural Networks(TDNN) [3] and Neural Network Finite Automata(NNFA)[4]. We also analyzed the time continuity employed in TWINN and pointed out that this kind of structure can memorize longer input history compared with Neural Network Finite Automata (NNFA). This may help to understand the well accepted fact that for learning grammatical reference with NNF A one had to start with very short strings in training set. The numerical example we used is a trajectory classification problem. This problem, making a feature of variable sampling rates, having internal states, continuous dynamics, heavily time-warped data and deformed phase space trajectories, is shown to be difficult to other schemes. With TWINN this problem has been learned in 100 iterations. For benchmark we also trained the exact same problem with TDNN and completely failed as expected.


Directional-Unit Boltzmann Machines

Neural Information Processing Systems

University of Toronto University of Toronto University of Colorado Toronto, ONT M5S lA4 Toronto, ONT M5S lA4 Boulder, CO 80309-0430 Abstract We present a general formulation for a network of stochastic directional units. This formulation is an extension of the Boltzmann machine in which the units are not binary, but take on values in a cyclic range, between 0 and 271' radians. The conditional distribution of a unit's stochastic state is a circular version of the Gaussian probability distribution, known as the von Mises distribution. This combination of a value and a certainty provides additional representational power in a unit. Many kinds of information can naturally be represented in terms of angular, or directional, variables.


Second order derivatives for network pruning: Optimal Brain Surgeon

Neural Information Processing Systems

We investigate the use of information from all second order derivatives of the error function to perfonn network pruning (i.e., removing unimportant weights from a trained network) in order to improve generalization, simplify networks, reduce hardware or storage requirements, increase the speed of further training, and in some cases enable rule extraction. Our method, Optimal Brain Surgeon (OBS), is Significantly better than magnitude-based methods and Optimal Brain Damage [Le Cun, Denker and Sol1a, 1990], which often remove the wrong weights. OBS permits the pruning of more weights than other methods (for the same error on the training set), and thus yields better generalization on test data. Crucial to OBS is a recursion relation for calculating the inverse Hessian matrix HI from training data and structural information of the net. OBS permits a 90%, a 76%, and a 62% reduction in weights over backpropagation with weighL decay on three benchmark MONK's problems [Thrun et aI., 1991]. Of OBS, Optimal Brain Damage, and magnitude-based methods, only OBS deletes the correct weights from a trained XOR network in every case. Finally, whereas Sejnowski and Rosenberg [1987J used 18,000 weights in their NETtalk network, we used OBS to prune a network to just 1560 weights, yielding better generalization.



A Boundary Hunting Radial Basis Function Classifier which Allocates Centers Constructively

Neural Information Processing Systems

A new boundary hunting radial basis function (BH-RBF) classifier which allocates RBF centers constructively near class boundaries is described. This classifier creates complex decision boundaries only in regions where confusions occur and corresponding RBF outputs are similar. A predicted square error measure is used to determine how many centers to add and to determine when to stop adding centers. Two experiments are presented which demonstrate the advantages of the BH RBF classifier. One uses artificial data with two classes and two input features where each class contains four clusters but only one cluster is near a decision region boundary.


Metamorphosis Networks: An Alternative to Constructive Models

Neural Information Processing Systems

Given a set oft raining examples, determining the appropriate number of free parameters is a challenging problem. Constructive learning algorithms attempt to solve this problem automatically by adding hidden units, and therefore free parameters, during learning. We explore an alternative class of algorithms-called metamorphosis algorithms-in which the number of units is fixed, but the number of free parameters gradually increases during learning. The architecture we investigate is composed of RBF units on a lattice, which imposes flexible constraints on the parameters of the network. Virtues of this approach include variable subset selection, robust parameter selection, multiresolution processing, and interpolation of sparse training data.


Kohonen Feature Maps and Growing Cell Structures - a Performance Comparison

Neural Information Processing Systems

A performance comparison of two self-organizing networks, the Kohonen Feature Map and the recently proposed Growing Cell Structures is made. For this purpose several performance criteria for self-organizing networks are proposed and motivated. The models are tested with three example problems of increasing difficulty. The Kohonen Feature Map demonstrates slightly superior results only for the simplest problem.