Tangent Prop - A formalism for specifying selected invariances in an adaptive network
Simard, Patrice, Victorri, Bernard, LeCun, Yann, Denker, John
In many machine learning applications, one has access, not only to training data, but also to some high-level a priori knowledge about the desired behavior of the system. For example, it is known in advance that the output of a character recognizer should be invariant with respect to small spatial distortions of the input images (translations, rotations, scale changes, etcetera). We have implemented a scheme that allows a network to learn the derivative of its outputs with respect to distortion operators of our choosing. This not only reduces the learning time and the amount of training data, but also provides a powerful language for specifying what generalizations we wish the network to perform. 1 INTRODUCTION In machine learning, one very often knows more about the function to be learned than just the training data. An interesting case is when certain directional derivatives of the desired function are known at certain points.
Kernel Regression and Backpropagation Training With Noise
Koistinen, Petri, Holmstrรถm, Lasse
One method proposed for improving the generalization capability of a feedforward network trained with the backpropagation algorithm is to use artificial training vectors which are obtained by adding noise to the original training vectors. We discuss the connection of such backpropagation training with noise to kernel density and kernel regression estimation. We compare by simulated examples (1) backpropagation, (2) backpropagation with noise, and (3) kernel regression in mapping estimation and pattern classification contexts.
Hierarchical Transformation of Space in the Visual System
Pouget, Alexandre, Fisher, Stephen A., Sejnowski, Terrence J.
Neurons encoding simple visual features in area VI such as orientation, direction of motion and color are organized in retinotopic maps. However, recent physiological experiments have shown that the responses of many neurons in VI and other cortical areas are modulated by the direction of gaze. We have developed a neural network model of the visual cortex to explore the hypothesis that visual features are encoded in headcentered coordinates at early stages of visual processing. New experiments are suggested for testing this hypothesis using electrical stimulations and psychophysical observations.
Refining PID Controllers using Neural Networks
Scott, Gary M., Shavlik, Jude W., Ray, W. Harmon
We apply this method to the task of controlling the outflow and temperature of a water tank, producing statistically-significant gains in accuracy over both a standard neural network approach and a non-learning PID controller. Furthermore, using the PID knowledge to initialize the weights of the network produces statistically less variation in testset accuracy when compared to networks initialized with small random numbers.
Against Edges: Function Approximation with Multiple Support Maps
Darrell, Trevor, Pentland, Alex
Networks for reconstructing a sparse or noisy function often use an edge field to segment the function into homogeneous regions, This approach assumes that these regions do not overlap or have disjoint parts, which is often false. For example, images which contain regions split by an occluding object can't be properly reconstructed using this type of network. We have developed a network that overcomes these limitations, using support maps to represent the segmentation of a signal. In our approach, the support of each region in the signal is explicitly represented. Results from an initial implementation demonstrate that this method can reconstruct images and motion sequences which contain complicated occlusion.
Principled Architecture Selection for Neural Networks: Application to Corporate Bond Rating Prediction
The notion of generalization ability can be defined precisely as the prediction risk, the expected performance of an estimator in predicting new observations. In this paper, we propose the prediction risk as a measure of the generalization ability of multi-layer perceptron networks and use it to select an optimal network architecture from a set of possible architectures. We also propose a heuristic search strategy to explore the space of possible architectures. The prediction risk is estimated from the available data; here we estimate the prediction risk by v-fold cross-validation and by asymptotic approximations of generalized cross-validation or Akaike's final prediction error. We apply the technique to the problem of predicting corporate bond ratings. This problem is very attractive as a case study, since it is characterized by the limited availability of the data and by the lack of a complete a priori model which could be used to impose a structure to the network architecture.