Goto

Collaborating Authors

 Moody, John


Data Visualization and Feature Selection: New Algorithms for Nongaussian Data

Neural Information Processing Systems

Visualization of input data and feature selection are intimately related. A good feature selection algorithm can identify meaningful coordinate projections for low dimensional data visualization. Conversely, a good visualization technique can suggest meaningfulfeatures to include in a model. Input variable selection is the most important step in the model selection process. Given a target variable, a set of input variables can be selected as explanatory variables by some prior knowledge.


Data Visualization and Feature Selection: New Algorithms for Nongaussian Data

Neural Information Processing Systems

Visualization of input data and feature selection are intimately related. A good feature selection algorithm can identify meaningful coordinate projections for low dimensional data visualization. Conversely, a good visualization technique can suggest meaningful features to include in a model. Input variable selection is the most important step in the model selection process. Given a target variable, a set of input variables can be selected as explanatory variables by some prior knowledge.


Towards Faster Stochastic Gradient Search

Neural Information Processing Systems

Stochastic gradient descent is a general algorithm which includes LMS, online backpropagation, and adaptive k-means clustering as special cases.


Towards Faster Stochastic Gradient Search

Neural Information Processing Systems

Stochastic gradient descent is a general algorithm which includes LMS, online backpropagation, and adaptive k-means clustering as special cases.


Towards Faster Stochastic Gradient Search

Neural Information Processing Systems

Stochastic gradient descent is a general algorithm which includes LMS, online backpropagation, and adaptive k-means clustering as special cases.


Networks with Learned Unit Response Functions

Neural Information Processing Systems

Feedforward networks composed of units which compute a sigmoidal function of a weighted sum of their inputs have been much investigated. We tested the approximation and estimation capabilities of networks using functions more complex than sigmoids. Three classes of functions were tested: polynomials, rational functions, and flexible Fourier series. Unlike sigmoids, these classes can fit nonmonotonic functions. They were compared on three problems: prediction of Boston housing prices, the sunspot count, and robot arm inverse dynamics. The complex units attained clearly superior performance on the robot arm problem, which is a highly nonmonotonic, pure approximation problem. On the noisy and only mildly nonlinear Boston housing and sunspot problems, differences among the complex units were revealed; polynomials did poorly, whereas rationals and flexible Fourier series were comparable to sigmoids. 1 Introduction


Principled Architecture Selection for Neural Networks: Application to Corporate Bond Rating Prediction

Neural Information Processing Systems

The notion of generalization ability can be defined precisely as the prediction risk, the expected performance of an estimator in predicting new observations. In this paper, we propose the prediction risk as a measure of the generalization ability of multi-layer perceptron networks and use it to select an optimal network architecture from a set of possible architectures. We also propose a heuristic search strategy to explore the space of possible architectures. The prediction risk is estimated from the available data; here we estimate the prediction risk by v-fold cross-validation and by asymptotic approximations of generalized cross-validation or Akaike's final prediction error. We apply the technique to the problem of predicting corporate bond ratings. This problem is very attractive as a case study, since it is characterized by the limited availability of the data and by the lack of a complete a priori model which could be used to impose a structure to the network architecture.


Networks with Learned Unit Response Functions

Neural Information Processing Systems

Feedforward networks composed of units which compute a sigmoidal function ofa weighted sum of their inputs have been much investigated. We tested the approximation and estimation capabilities of networks using functions more complex than sigmoids. Three classes of functions were tested: polynomials, rational functions, and flexible Fourier series. Unlike sigmoids,these classes can fit nonmonotonic functions. They were compared on three problems: prediction of Boston housing prices, the sunspot count, and robot arm inverse dynamics. The complex units attained clearlysuperior performance on the robot arm problem, which is a highly nonmonotonic, pure approximation problem. On the noisy and only mildly nonlinear Boston housing and sunspot problems, differences among the complex units were revealed; polynomials did poorly, whereas rationals and flexible Fourier series were comparable to sigmoids. 1 Introduction


Principled Architecture Selection for Neural Networks: Application to Corporate Bond Rating Prediction

Neural Information Processing Systems

The notion of generalization ability can be defined precisely as the prediction risk,the expected performance of an estimator in predicting new observations. In this paper, we propose the prediction risk as a measure of the generalization ability of multi-layer perceptron networks and use it to select an optimal network architecture from a set of possible architectures. Wealso propose a heuristic search strategy to explore the space of possible architectures. The prediction risk is estimated from the available data; here we estimate the prediction risk by v-fold cross-validation and by asymptotic approximations of generalized cross-validation or Akaike's final prediction error. We apply the technique to the problem of predicting corporate bond ratings. This problem is very attractive as a case study, since it is characterized by the limited availability of the data and by the lack of a complete a priori model which could be used to impose a structure to the network architecture.


Fast Learning in Multi-Resolution Hierarchies

Neural Information Processing Systems

A variety of approaches to adaptive information processing have been developed by workers in disparate disciplines. These include the large body of literature on approximation and interpolation techniques (curve and surface fitting), the linear, real-time adaptive signal processing systems (such as the adaptive linear combiner and the Kalman filter), and most recently, the reincarnation of nonlinear neural network models such as the multilayer perceptron. Each of these methods has its strengths and weaknesses. The curve and surface fitting techniques are excellent for off-line data analysis, but are typically not formulated with real-time applications in mind. The linear techniques of adaptive signal processing and adaptive control are well-characterized, but are limited to applications for which linear descriptions are appropriate. Finally, neural network learning models such as back propagation have proven extremely versatile at learning a wide variety of nonlinear mappings, but tend to be very slow computationally and are not yet well characterized.