Goto

Collaborating Authors

 Country


Benchmarking Feed-Forward Neural Networks: Models and Measures

Neural Information Processing Systems

Existing metrics for the learning performance of feed-forward neural networks do not provide a satisfactory basis for comparison because the choice of the training epoch limit can determine the results of the comparison. I propose new metrics which have the desirable property of being independent of the training epoch limit. The efficiency measures the yield of correct networks in proportion to the training effort expended. The optimal epoch limit provides the greatest efficiency. The learning performance is modelled statistically, and asymptotic performance is estimated. Implementation details may be found in (Harney, 1992).


A Comparison of Projection Pursuit and Neural Network Regression Modeling

Neural Information Processing Systems

Two projection based feedforward network learning methods for modelfree regression problems are studied and compared in this paper: one is the popular back-propagation learning (BPL); the other is the projection pursuit learning (PPL).


Human and Machine 'Quick Modeling'

Neural Information Processing Systems

We present here an interesting experiment in'quick modeling' by humans, performed independently on small samples, in several languages and two continents, over the last three years. Comparisons to decision tree procedures and neural net processing are given. From these, we conjecture that human reasoning is better represented by the latter, but substantially different from both. Implications for the'strong convergence hypothesis' between neural networks and machine learning are discussed, now expanded to include human reasoning comparisons. 1 INTRODUCTION Until recently the fields of symbolic and connectionist learning evolved separately. Suddenly in the last two years a significant number of papers comparing the two methodologies have appeared. A beginning synthesis of these two fields was forged at the NIPS '90 Workshop #5 last year (Pratt and Norton, 1990), where one may find a good bibliography of the recent work of Atlas, Dietterich, Omohundro, Sanger, Shavlik, Tsoi, Utgoff and others. It was at that NIPS '90 Workshop that we learned of these studies, most of which concentrate on performance comparisons of decision tree algorithms (such as ID3, CART) and neural net algorithms (such as Perceptrons, Backpropagation). Independently three years ago we had looked at Quinlan's ID3 scheme (Quinlan, 1984) and intuitively and rather instantly not agreeing with the generalization he obtains by ID3 from a sample of 8 items generalized to 12 items, we subjected this example to a variety of human experiments. We report our findings, as compared to the performance of ID3 and also to various neural net computations.


A Topographic Product for the Optimization of Self-Organizing Feature Maps

Neural Information Processing Systems

Self-organizing feature maps like the Kohonen map (Kohonen, 1989, Ritter et al., 1990) not only provide a plausible explanation for the formation of maps in brains, e.g. in the visual system (Obermayer et al., 1990), but have also been applied to problems like vector quantization, or robot arm control (Martinetz et al., 1990). The underlying organizing principle is the preservation of neighborhood relations. For this principle to lead to a most useful map, the topological structure of the output space must roughly fit the structure of the input data. However, in technical 1141 1142 Bauer, Pawelzik, and Geisel applications this structure is often not a priory known. For this reason several attempts have been made to modify the Kohonen-algorithm such, that not only the weights, but also the output space topology itself is adapted during learning (Kangas et al., 1990, Martinetz et al., 1991). Our contribution is also concerned with optimal output space topologies, but we follow a different approach, which avoids a possibly complicated structure of the output space. First we describe a quantitative measure for the preservation of neighborhood relations in maps, the topographic product P. The topographic product had been invented under the name of" wavering product" in nonlinear dynamics in order to optimize the embeddings of chaotic attractors (Liebert et al., 1991).


Shooting Craps in Search of an Optimal Strategy for Training Connectionist Pattern Classifiers

Neural Information Processing Systems

We compare two strategies for training connectionist (as well as nonconnectionist) models for statistical pattern recognition. The probabilistic strategy is based on the notion that Bayesian discrimination (i.e.- optimal classification) is achieved when the classifier learns the a posteriori class distributions of the random feature vector. The differential strategy is based on the notion that the identity of the largest class a posteriori probability of the feature vector is all that is needed to achieve Bayesian discrimination. Each strategy is directly linked to a family of objective functions that can be used in the supervised training procedure. We prove that the probabilistic strategy - linked with error measure objective functions such as mean-squared-error and cross-entropy - typically used to train classifiers necessarily requires larger training sets and more complex classifier architectures than those needed to approximate the Bayesian discriminant function.


A Weighted Probabilistic Neural Network

Neural Information Processing Systems

The Probabilistic Neural Network (PNN) algorithm represents the likelihood function of a given class as the sum of identical, isotropic Gaussians. In practice, PNN is often an excellent pattern classifier, outperforming other classifiers including backpropagation.


A Network of Localized Linear Discriminants

Neural Information Processing Systems

The localized linear discriminant network (LLDN) has been designed to address classification problems containing relatively closely spaced data from different classes (encounter zones [1], the accuracy problem [2]). Locally trained hyperplane segments are an effective way to define the decision boundaries for these regions [3]. The LLD uses a modified perceptron training algorithm for effective discovery of separating hyperplane/sigmoid units within narrow boundaries. The basic unit of the network is the discriminant receptive field (DRF) which combines the LLD function with Gaussians representing the dispersion of the local training data with respect to the hyperplane. The DRF implements a local distance measure [4], and obtains the benefits of networks oflocalized units [5]. A constructive algorithm for the two-class case is described which incorporates DRF's into the hidden layer to solve local discrimination problems. The output unit produces a smoothed, piecewise linear decision boundary. Preliminary results indicate the ability of the LLDN to efficiently achieve separation when boundaries are narrow and complex, in cases where both the "standard" multilayer perceptron (MLP) and k-nearest neighbor (KNN) yield high error rates on training data. 1 The LLD Training Algorithm and DRF Generation The LLD is defined by the hyperplane normal vector V and its "midpoint" M (a translated origin [1] near the center of gravity of the training data in feature space).


Unsupervised Classifiers, Mutual Information and 'Phantom Targets

Neural Information Processing Systems

We derive criteria for training adaptive classifier networks to perform unsupervised data analysis. The first criterion turns a simple Gaussian classifier into a simple Gaussian mixture analyser. The second criterion, which is much more generally applicable, is based on mutual information.


Data Analysis using G/SPLINES

Neural Information Processing Systems

G/SPLINES is an algorithm for building functional models of data. It uses genetic search to discover combinations of basis functions which are then used to build a least-squares regression model. Because it produces a population of models which evolve over time rather than a single model, it allows analysis not possible with other regression-based approaches. 1 INTRODUCTION G/SPLINES is a hybrid of Friedman's Multivariable Adaptive Regression Splines (MARS) algorithm (Friedman, 1990) with Holland's Genetic Algorithm (Holland, 1975). G/SPLINES has advantages over MARS in that it requires fewer least-squares computations, is easily extendable to non-spline basis functions, may discover models inaccessible to local-variable selection algorithms, and allows significantly larger problems to be considered. These issues are discussed in (Rogers, 1991). This paper begins with a discussion of linear regression models, followed by a description of the G/SPLINES algorithm, and finishes with a series of experiments illustrating its performance, robustness, and analysis capabilities.


Information Measure Based Skeletonisation

Neural Information Processing Systems

Automatic determination of proper neural network topology by trimming oversized networks is an important area of study, which has previously been addressed using a variety of techniques. In this paper, we present Information Measure Based Skeletonisation (IMBS), a new approach to this problem where superfluous hidden units are removed based on their information measure (1M). This measure, borrowed from decision tree induction techniques, reflects the degree to which the hyperplane formed by a hidden unit discriminates between training data classes. We show the results of applying IMBS to three classification tasks and demonstrate that it removes a substantial number of hidden units without significantly affecting network performance.