Goto

Collaborating Authors

 feedforward net


Expressivity of Quadratic Neural ODEs

arXiv.org Artificial Intelligence

This work focuses on deriving quantitative approximation error bounds for neural ordinary differential equations having at most quadratic nonlinearities in the dynamics. The simple dynamics of this model form demonstrates how expressivity can be derived primarily from iteratively composing many basic elementary operations, versus from the complexity of those elementary operations themselves. Like the analog differential analyzer and universal polynomial DAEs, the expressivity is derived instead primarily from the "depth" of the model. These results contribute to our understanding of what depth specifically imparts to the capabilities of deep learning architectures.


Generalization and Parameter Estimation in Feedforward Nets: Some Experiments

Neural Information Processing Systems

We have done an empirical study of the relation of the number of parameters (weights) in a feedforward net to generalization perfor(cid:173) mance. In one, we use simulated data sets with well-controlled parameters, such as the signal-to-noise ratio of continuous-valued data. In the second, we train the network on vector-quantized mel cepstra from real speech samples. In each case, we use back-propagation to train the feedforward net to discriminate in a multiple class pattern classification problem. We report the results of these studies, and show the application of cross-validation techniques to prevent overfitting.


Recurrent Networks: Second Order Properties and Pruning

Neural Information Processing Systems

Second order properties of cost functions for recurrent networks are investigated. We analyze a layered fully recurrent architecture, the virtue of this architecture is that it features the conventional feedforward architecture as a special case. A detailed description of recursive computation of the full Hessian of the network cost function is provided. We discuss the possibility of invoking simplifying approximations of the Hessian and show how weight decays iron the cost function and thereby greatly assist training. We present tentative pruning results, using Hassibi et al.'s Optimal Brain Surgeon, demonstrating that recurrent networks can construct an efficient internal memory. 1 LEARNING IN RECURRENT NETWORKS Time series processing is an important application area for neural networks and numerous architectures have been suggested, see e.g. (Weigend and Gershenfeld, 94). The most general structure is a fully recurrent network and it may be adapted using Real Time Recurrent Learning (RTRL) suggested by (Williams and Zipser, 89). By invoking a recurrent network, the length of the network memory can be adapted to the given time series, while it is fixed for the conventional lag-space net (Weigend et al., 90). In forecasting, however, feedforward architectures remain the most popular structures; only few applications are reported based on the Williams&Zipser approach.


Recurrent Networks: Second Order Properties and Pruning

Neural Information Processing Systems

Second order properties of cost functions for recurrent networks are investigated. We analyze a layered fully recurrent architecture, the virtue of this architecture is that it features the conventional feedforward architecture as a special case. A detailed description of recursive computation of the full Hessian of the network cost function is provided. We discuss the possibility of invoking simplifying approximations of the Hessian and show how weight decays iron the cost function and thereby greatly assist training. We present tentative pruning results, using Hassibi et al.'s Optimal Brain Surgeon, demonstrating that recurrent networks can construct an efficient internal memory. 1 LEARNING IN RECURRENT NETWORKS Time series processing is an important application area for neural networks and numerous architectures have been suggested, see e.g. (Weigend and Gershenfeld, 94). The most general structure is a fully recurrent network and it may be adapted using Real Time Recurrent Learning (RTRL) suggested by (Williams and Zipser, 89). By invoking a recurrent network, the length of the network memory can be adapted to the given time series, while it is fixed for the conventional lag-space net (Weigend et al., 90). In forecasting, however, feedforward architectures remain the most popular structures; only few applications are reported based on the Williams&Zipser approach.


Recurrent Networks: Second Order Properties and Pruning

Neural Information Processing Systems

Second order properties of cost functions for recurrent networks are investigated. We analyze a layered fully recurrent architecture, the virtue of this architecture is that it features the conventional feedforward architecture as a special case. A detailed description of recursive computation of the full Hessian of the network cost function isprovided. We discuss the possibility of invoking simplifying approximations of the Hessian and show how weight decays iron the cost function and thereby greatly assist training. We present tentative pruningresults, using Hassibi et al.'s Optimal Brain Surgeon, demonstrating that recurrent networks can construct an efficient internal memory. 1 LEARNING IN RECURRENT NETWORKS Time series processing is an important application area for neural networks and numerous architectures have been suggested, see e.g.


Remarks on Interpolation and Recognition Using Neural Nets

Neural Information Processing Systems

We consider different types of single-hidden-Iayer feedforward nets: with or without direct input to output connections, and using either threshold or sigmoidal activation functions. The main results show that direct connections in threshold nets double the recognition but not the interpolation power, while using sigmoids rather than thresholds allows (at least) doubling both. Various results are also given on VC dimension and other measures of recognition capabilities.


Remarks on Interpolation and Recognition Using Neural Nets

Neural Information Processing Systems

We consider different types of single-hidden-Iayer feedforward nets: with or without direct input to output connections, and using either threshold or sigmoidal activation functions. The main results show that direct connections in threshold nets double the recognition but not the interpolation power, while using sigmoids rather than thresholds allows (at least) doubling both. Various results are also given on VC dimension and other measures of recognition capabilities.


Remarks on Interpolation and Recognition Using Neural Nets

Neural Information Processing Systems

We consider different types of single-hidden-Iayer feedforward nets: with or without direct input to output connections, and using either threshold orsigmoidal activation functions. The main results show that direct connections in threshold nets double the recognition but not the interpolation power,while using sigmoids rather than thresholds allows (at least) doubling both. Various results are also given on VC dimension and other measures of recognition capabilities.