Goto

Collaborating Authors

 Country


Combining Estimators Using Non-Constant Weighting Functions

Neural Information Processing Systems

Volker Tresp*and Michiaki Taniguchi Siemens AG, Central Research Otto-Hahn-Ring 6 81730 Miinchen, Germany Abstract This paper discusses the linearly weighted combination of estimators inwhich the weighting functions are dependent on the input. We show that the weighting functions can be derived either by evaluating the input dependent variance of each estimator or by estimating how likely it is that a given estimator has seen data in the region of the input space close to the input pattern. The latter solutionis closely related to the mixture of experts approach and we show how learning rules for the mixture of experts can be derived from the theory about learning with missing features. The presented approaches are modular since the weighting functions can easily be modified (no retraining) if more estimators are added. Furthermore,it is easy to incorporate estimators which were not derived from data such as expert systems or algorithms. 1 Introduction Instead of modeling the global dependency between input x E D and output y E using a single estimator, it is often very useful to decompose a complex mapping -'\.t the time of the research for this paper, a visiting researcher at the Center for Biological and Computational Learning, MIT.




Limits on Learning Machine Accuracy Imposed by Data Quality

Neural Information Processing Systems

Random errors and insufficiencies in databases limit the performance ofany classifier trained from and applied to the database. In this paper we propose a method to estimate the limiting performance ofclassifiers imposed by the database. We demonstrate this technique on the task of predicting failure in telecommunication paths. 1 Introduction Data collection for a classification or regression task is prone to random errors, e.g.


Model of a Biological Neuron as a Temporal Neural Network

Neural Information Processing Systems

A biological neuron can be viewed as a device that maps a multidimensional temporalevent signal (dendritic postsynaptic activations) into a unidimensional temporal event signal (action potentials). We have designed a network, the Spatio-Temporal Event Mapping (STEM) architecture, which can learn to perform this mapping for arbitrary biophysical modelsof neurons. Such a network appropriately trained, called a STEM cell, can be used in place of a conventional compartmental modelin simulations where only the transfer function is important, such as network simulations. The STEM cell offers advantages over compartmental models in terms of computational efficiency, analytical tractabili1ty, and as a framework for VLSI implementations of biological neurons.


The Electrotonic Transformation: a Tool for Relating Neuronal Form to Function

Neural Information Processing Systems

The spatial distribution and time course of electrical signals in neurons have important theoretical and practical consequences. Because it is difficult to infer how neuronal form affects electrical signaling, we have developed a quantitative yet intuitive approach to the analysis of electrotonus. This approach transforms the architecture of the cell from anatomical to electrotonic space, using the logarithm of voltage attenuation as the distance metric. We describe the theory behind this approach and illustrate its use. 1 INTRODUCTION The fields of computational neuroscience and artificial neural nets have enjoyed a mutually beneficial exchange of ideas. This has been most evident at the network level, where concepts such as massive parallelism, lateral inhibition, and recurrent excitation have inspired both the analysis of brain circuits and the design of artificial neural net architectures.


Bayesian Query Construction for Neural Network Models

Neural Information Processing Systems

If data collection is costly, there is much to be gained by actively selecting particularlyinformative data points in a sequential way. In a Bayesian decision-theoretic framework we develop a query selection criterionwhich explicitly takes into account the intended use of the model predictions. By Markov Chain Monte Carlo methods the necessary quantities can be approximated to a desired precision. Asthe number of data points grows, the model complexity is modified by a Bayesian model selection strategy. The properties oftwo versions of the criterion ate demonstrated in numerical experiments.


Adaptive Elastic Input Field for Recognition Improvement

Neural Information Processing Systems

For machines to perform classification tasks, such as speech and character recognition, appropriately handling deformed patterns is a key to achieving high performance. The authors presents a new type of classification system, an Adaptive Input Field Neural Network(AIFNN), which includes a simple pre-trained neural network and an elastic input field attached to an input layer. By using an iterative method, AIFNN can determine an optimal affine translation for an elastic input field to compensate for the original deformations. The convergence of the AIFNN algorithm is shown. AIFNN is applied for handwritten numerals recognition. Consequently, 10.83%of originally misclassified patterns are correctly categorized and total performance is improved, without modifying the neural network. 1 Introduction For machines to accomplish classification tasks, such as speech and character recognition, appropriatelyhandling deformed patterns is a key to achieving high performance [Simard 92] [Simard 93] [Hinton 92] [Barnard 91]. The number of reasonable deformations of patterns is enormous, since they can be either linear translations (an affine translation or a time shifting) or nonlinear deformations (a set of combinations ofpartial translations), or both. Although a simple neural network (e.g. a 3-layered neural network) is able to adapt 1102 MinoruAsogawa



A Non-linear Information Maximisation Algorithm that Performs Blind Separation

Neural Information Processing Systems

With the exception of (Becker 1992), there has been little attempt to use non-linearity in networks to achieve something a linear network could not. Nonlinear networks, however, are capable of computing more general statistics than those second-order ones involved in decorrelation, and as a consequence they are capable of dealing with signals (and noises) which have detailed higher-order structure. The success of the'H-J' networks at blind separation (Jutten & Herault 1991)suggests that it should be possible to separate statistically independent components, by using learning rules which make use of moments of all orders. This paper takes a principled approach to this problem, by starting with the question ofhow to maximise the information passed on in nonlinear feed-forward network. Startingwith an analysis of a single unit, the approach is extended to a network mapping N inputs to N outputs. In the process, it will be shown that, under certain fairly weak conditions, the N ---. N network forms a minimally redundant encodingofthe inputs, and that it therefore performs Independent Component Analysis (ICA). 2 Information maximisation The information that output Y contains about input X is defined as: I(Y, X) H(Y) - H(YIX) (1) where H(Y) is the entropy (information) in the output, while H(YIX) is whatever information the output has which didn't come from the input. In the case that we have no noise (or rather, we don't know what is noise and what is signal in the input), the mapping between X and Y is deterministic and H(YIX) has its lowest possible value of