Information Technology
Dynamics of Training
Bös, Siegfried, Opper, Manfred
A new method to calculate the full training process of a neural network is introduced. No sophisticated methods like the replica trick are used. The results are directly related to the actual number of training steps. Some results are presented here, like the maximal learning rate, an exact description of early stopping, and the necessary number of training steps. Further problems can be addressed with this approach.
Noisy Spiking Neurons with Temporal Coding have more Computational Power than Sigmoidal Neurons
Furthermore it is shown that networks of noisy spiking neurons with temporal coding have a strictly larger computational power than sigmoidal neural nets with the same number of units. 1 Introduction and Definitions We consider a formal model SNN for a §piking neuron network that is basically a reformulation of the spike response model (and of the leaky integrate and fire model) without using 6-functions (see [Maass, 1996a] or [Maass, 1996b] for further backgrou nd).
A Constructive Learning Algorithm for Discriminant Tangent Models
Sona, Diego, Sperduti, Alessandro, Starita, Antonina
To reduce the computational complexity of classification systems using tangent distance, Hastie et al. (HSS) developed an algorithm to devise rich models for representing large subsets of the data which computes automatically the "best" associated tangent subspace. Schwenk & Milgram proposed a discriminant modular classification system (Diabolo) based on several autoassociative multilayer perceptrons which use tangent distance as error reconstruction measure. We propose a gradient based constructive learning algorithm for building a tangent subspace model with discriminant capabilities which combines several of the the advantages of both HSS and Diabolo: devised tangent models hold discriminant capabilities, space requirements are improved with respect to HSS since our algorithm is discriminant and thus it needs fewer prototype models, dimension of the tangent subspace is determined automatically by the constructive algorithm, and our algorithm is able to learn new transformations.
Promoting Poor Features to Supervisors: Some Inputs Work Better as Outputs
Caruana, Rich, Sa, Virginia R. de
In supervised learning there is usually a clear distinction between inputs and outputs - inputs are what you will measure, outputs are what you will predict from those measurements. This paper shows that the distinction between inputs and outputs is not this simple. Some features are more useful as extra outputs than as inputs. By using a feature as an output we get more than just the case values but can. For many features this mapping may be more useful than the feature value itself.
A Comparison between Neural Networks and other Statistical Techniques for Modeling the Relationship between Tobacco and Alcohol and Cancer
Plate, Tony, Band, Pierre, Bert, Joel, Grace, John
Epidemiological data is traditionally analyzed with very simple techniques. Flexible models, such as neural networks, have the potential to discover unanticipated features in the data. However, to be useful, flexible models must have effective control on overfitting. This paper reports on a comparative study of the predictive quality of neural networks and other flexible models applied to real and artificial epidemiological data. The results suggest that there are no major unanticipated complex features in the real data, and also demonstrate that MacKay's [1995] Bayesian neural network methodology provides effective control on overfitting while retaining the ability to discover complex features in the artificial data. 1 Introduction Traditionally, very simple statistical techniques are used in the analysis of epidemiological studies.
Computing with Infinite Networks
For neural networks with a wide class of weight-priors, it can be shown that in the limit of an infinite number of hidden units the prior over functions tends to a Gaussian process. In this paper analytic forms are derived for the covariance function of the Gaussian processes corresponding to networks with sigmoidal and Gaussian hidden units. This allows predictions to be made efficiently using networks with an infinite number of hidden units, and shows that, somewhat paradoxically, it may be easier to compute with infinite networks than finite ones. 1 Introduction To someone training a neural network by maximizing the likelihood of a finite amount of data it makes no sense to use a network with an infinite number of hidden units; the network will "overfit" the data and so will be expected to generalize poorly. However, the idea of selecting the network size depending on the amount of training data makes little sense to a Bayesian; a model should be chosen that reflects the understanding of the problem, and then application of Bayes' theorem allows inference to be carried out (at least in theory) after the data is observed. In the Bayesian treatment of neural networks, a question immediately arises as to how many hidden units are believed to be appropriate for a task. Neal (1996) has argued compellingly that for real-world problems, there is no reason to believe that neural network models should be limited to nets containing only a "small" number of hidden units. He has shown that it is sensible to consider a limit where the number of hidden units in a net tends to infinity, and that good predictions can be obtained from such models using the Bayesian machinery. He has also shown that for fixed hyperparameters, a large class of neural network models will converge to a Gaussian process prior over functions in the limit of an infinite number of hidden units.
On a Modification to the Mean Field EM Algorithm in Factorial Learning
Dunmur, A. P., Titterington, D. M.
A modification is described to the use of mean field approximations in the E step of EM algorithms for analysing data from latent structure models, as described by Ghahramani (1995), among others. The modification involves second-order Taylor approximations to expectations computed in the E step. The potential benefits of the method are illustrated using very simple latent profile models.
A Silicon Model of Amplitude Modulation Detection in the Auditory Brainstem
Schaik, André van, Fragnière, Eric, Vittoz, Eric A.
Detectim of the periodicity of amplitude modulatim is a major step in the determinatim of the pitch of a SOODd. In this article we will present a silicm model that uses synchrroicity of spiking neurms to extract the fundamental frequency of a SOODd. It is based m the observatim that the so called'Choppers' in the mammalian Cochlear Nucleus synchrmize well for certain rates of amplitude modulatim, depending m the cell's intrinsic chopping frequency. Our silicm model uses three different circuits, i.e., an artificial cochlea, an Inner Hair Cell circuit, and a spiking neuron circuit
Unsupervised Learning by Convex and Conic Coding
Lee, Daniel D., Seung, H. Sebastian
Unsupervised learning algorithms based on convex and conic encoders are proposed. The encoders find the closest convex or conic combination of basis vectors to the input. The learning algorithms produce basis vectors that minimize the reconstruction error of the encoders. The convex algorithm develops locally linear models of the input, while the conic algorithm discovers features. Both algorithms are used to model handwritten digits and compared with vector quantization and principal component analysis.