Goto

Collaborating Authors

 Moody, John E.


Fast Pruning Using Principal Components

Neural Information Processing Systems

In this procedure one transforms variables to a basis in which the covariance isdiagonal and then projects out the low variance directions. While application of PCA to remove input variables is useful in some cases (Leen et al., 1990), there is no guarantee that low variance variables have little effect on error. We propose a saliency measure, based on PCA, that identifies those variables that have the least effect on error. Our proposed Principal Components Pruning algorithm applies this measure to obtain a simple and cheap pruning technique in the context of supervised learning. Fast Pruning Using Principal Components 37 Special Case: PCP in Linear Regression In unbiased linear models, one can bound the bias introduced from pruning the principal degrees of freedom in the model.


Fast Pruning Using Principal Components

Neural Information Processing Systems

The assumption is that there exists an underlying (possibly noisy) functional relationship relating the outputs to the inputs y /(u,e) where e denotes the noise. The aim of the learning process is to approximate this relationship based on the the training set.


Weight Space Probability Densities in Stochastic Learning: I. Dynamics and Equilibria

Neural Information Processing Systems

The ensemble dynamics of stochastic learning algorithms can be studied using theoretical techniques from statistical physics. We develop the equations of motion for the weight space probability densities for stochastic learning algorithms. We discuss equilibria in the diffusion approximation and provide expressions for special cases of the LMS algorithm. The equilibrium densities are not in general thermal (Gibbs) distributions in the objective function being minimized, but rather depend upon an effective potential that includes diffusion effects. Finally we present an exact analytical expression for the time evolution of the density for a learning algorithm with weight updates proportional to the sign of the gradient.


Weight Space Probability Densities in Stochastic Learning: I. Dynamics and Equilibria

Neural Information Processing Systems

The ensemble dynamics of stochastic learning algorithms can be studied using theoretical techniques from statistical physics. We develop the equations of motion for the weight space probability densities for stochastic learning algorithms. We discuss equilibria in the diffusion approximation and provide expressions for special cases of the LMS algorithm. The equilibrium densities are not in general thermal (Gibbs) distributions in the objective function being minimized, but rather depend upon an effective potential that includes diffusion effects. Finally we present an exact analytical expression for the time evolution of the density for a learning algorithm with weight updates proportional to the sign of the gradient.


Weight Space Probability Densities in Stochastic Learning: I. Dynamics and Equilibria

Neural Information Processing Systems

The ensemble dynamics of stochastic learning algorithms can be studied using theoretical techniques from statistical physics. We develop the equations of motion for the weight space probability densities for stochastic learning algorithms. We discuss equilibria in the diffusion approximation and provide expressions for special cases of the LMS algorithm. The equilibrium densities are not in general thermal (Gibbs) distributions in the objective function being minimized,but rather depend upon an effective potential that includes diffusion effects. Finally we present an exact analytical expression for the time evolution of the density for a learning algorithm withweight updates proportional to the sign of the gradient.


The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems

Neural Information Processing Systems

We present an analysis of how the generalization performance (expected test set error) relates to the expected training set error for nonlinear learning systems, such as multilayer perceptrons and radial basis functions.


The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems

Neural Information Processing Systems

We present an analysis of how the generalization performance (expected test set error) relates to the expected training set error for nonlinear learning systems,such as multilayer perceptrons and radial basis functions.


Note on Learning Rate Schedules for Stochastic Optimization

Neural Information Processing Systems

We present and compare learning rate schedules for stochastic gradient descent, a general algorithm which includes LMS, online backpropagation and k-means clustering as special cases. We introduce "search-thenconverge" type schedules which outperform the classical constant and "running average" (1ft) schedules both in speed of convergence and quality of solution.


Note on Learning Rate Schedules for Stochastic Optimization

Neural Information Processing Systems

We present and compare learning rate schedules for stochastic gradient descent, a general algorithm which includes LMS, online backpropagation andk-means clustering as special cases. We introduce "search-thenconverge" typeschedules which outperform the classical constant and "running average" (1ft) schedules both in speed of convergence and quality of solution.


Note on Development of Modularity in Simple Cortical Models

Neural Information Processing Systems

We show that localized activity patterns in a layer of cells, collective excitations, can induce the formation of modular structures in the anatomical connections via a Hebbian learning mechanism. The networks are spatially homogeneous before learning, but the spontaneous emergence of localized collective excitations and subsequently modularity in the connection patterns breaks translational symmetry. This spontaneous symmetry breaking phenomenon is similar to those which drive pattern formation in reaction-diffusion systems. We have identified requirements on the patterns of lateral connections and on the gains of internal units which are essential for the development of modularity. These essential requirements will most likely remain operative when more complicated (and biologically realistic) models are considered.