AITopics

Stochastic gradient descent is a general algorithm which includes LMS, online backpropagation, and adaptive k-means clustering as special cases.

artificial intelligence, converge, machine learning, (14 more...)

Country: North America > United States > California (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.95)

Gradient Descent: Second Order Momentum and Saturating Error

Pearlmutter, Barak

We then regard gradient descent with momentum as a dynamic system and explore a non quadratic error surface, showing that saturation of the error accounts for a variety of effects observed in simulations and justifies some popular heuristics. 1 INTRODUCTION Gradient descent is the bread-and-butter optimization technique in neural networks. Some people build special purpose hardware to accelerate gradient descent optimization ofbackpropagation networks. Understanding the dynamics of gradient descent on such surfaces is therefore of great practical value. Here we briefly review the known results in the convergence of batch gradient descent; showthat second-order momentum does not give any speedup; simulate a real network and observe some effect not predicted by theory; and account for these effects by analyzing gradient descent with momentum on a saturating error surface.

artificial intelligence, convergence, machine learning, (12 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Moody, John, Utans, Joachim

Principled Architecture Selection for Neural Networks: Application to Corporate Bond Rating Prediction

The notion of generalization ability can be defined precisely as the prediction risk,the expected performance of an estimator in predicting new observations. In this paper, we propose the prediction risk as a measure of the generalization ability of multi-layer perceptron networks and use it to select an optimal network architecture from a set of possible architectures. Wealso propose a heuristic search strategy to explore the space of possible architectures. The prediction risk is estimated from the available data; here we estimate the prediction risk by v-fold cross-validation and by asymptotic approximations of generalized cross-validation or Akaike's final prediction error. We apply the technique to the problem of predicting corporate bond ratings. This problem is very attractive as a case study, since it is characterized by the limited availability of the data and by the lack of a complete a priori model which could be used to impose a structure to the network architecture.

architecture, artificial intelligence, machine learning, (14 more...)

Country: North America > United States > New York (0.14)

Industry: Banking & Finance > Credit (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)

Merging Constrained Optimisation with Deterministic Annealing to "Solve" Combinatorially Hard Problems

Stolorz, Paul

Several parallel analogue algorithms, based upon mean field theory (MFT) approximations to an underlying statistical mechanics formulation, and requiring anexternally prescribed annealing schedule, now exist for finding approximate solutions to difficult combinatorial optimisation problems. They have been applied to the Travelling Salesman Problem (TSP), as well as to various issues in computational vision and cluster analysis. I show here that any given MFT algorithm can be combined in a natural way with notions from the areas of constrained optimisation and adaptive simulated annealing to yield a single homogenous and efficient parallel relaxation technique,for which an externally prescribed annealing schedule is no longer required. The results of numerical simulations on 50-city and 100-city TSP problems are presented, which show that the ensuing algorithms aretypically an order of magnitude faster than the MFT algorithms alone, and which also show, on occasion, superior solutions as well. 1 INTRODUCTION Several promising parallel analogue algorithms, which can be loosely described by the term "deterministic annealing", or "mean field theory (MFT) annealing", have *also at Theoretical Division and Center for Nonlinear Studies, MSB213, Los Alamos National Laboratory, Los Alamos, NM 87545.

algorithm, artificial intelligence, machine learning, (15 more...)

Country: North America > United States > New Mexico > Los Alamos County > Los Alamos (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.56)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Sanger, Terence D., Sutton, Richard S., Matheus, Christopher J.

Iterative Construction of Sparse Polynomial Approximations

Terence D. Sanger Massachusetts Institute of Technology Room E25-534 Cambridge, MA 02139 tds@ai.mit.edu Abstract We present an iterative algorithm for nonlinear regression based on construction ofsparse polynomials. Polynomials are built sequentially from lower to higher order. Selection of new terms is accomplished using a novel look-ahead approach that predicts whether a variable contributes to the remaining error. The algorithm is based on the tree-growing heuristic in LMS Trees which we have extended to approximation of arbitrary polynomials ofthe input features.

artificial intelligence, machine learning, polynomial, (14 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.54)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)

Jordan, Michael I., Jacobs, Robert A.

Hierarchies of adaptive experts

Another class of nonlinear algorithms, exemplified by CART (Breiman, Friedman, Olshen, & Stone, 1984) and MARS (Friedman, 1990), generalizes classicaltechniques by partitioning the training data into non-overlapping regions and fitting separate models in each of the regions. These two classes of algorithms extendlinear techniques in essentially independent directions, thus it seems worthwhile to investigate algorithms that incorporate aspects of both approaches to model estimation. Such algorithms would be related to CART and MARS as multilayer neural networks are related to linear statistical techniques.

architecture, artificial intelligence, machine learning, (17 more...)

Country:

North America > United States > Massachusetts (0.14)
North America > United States > Iowa (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

A Comparison of Projection Pursuit and Neural Network Regression Modeling

Huang, Jenq-Neng, Li, Hang, Maechler, Martin, Martin, R. Douglas, Schimert, Jim

Two projection based feedforward network learning methods for modelfree regressionproblems are studied and compared in this paper: one is the popular back-propagation learning (BPL); the other is the projection pursuit learning (PPL).

artificial intelligence, machine learning, neuron, (16 more...)

Country: North America > United States > Washington > King County > Seattle (0.15)

Industry: Energy (0.44)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.41)

Wettschereck, Dietrich, Dietterich, Thomas

Improving the Performance of Radial Basis Function Networks by Learning Center Locations

Three methods for improving the performance of (gaussian) radial basis function (RBF) networks were tested on the NETtaik task. In RBF, a new example is classified by computing its Euclidean distance to a set of centers chosen by unsupervised methods. The application of supervised learning to learn a non-Euclidean distance metric was found to reduce the error rate of RBF networks, while supervised learning of each center's variance resultedin inferior performance. The best improvement in accuracy was achieved by networks called generalized radial basis function (GRBF) networks. In GRBF, the center locations are determined by supervised learning. After training on 1000 words, RBF classifies 56.5% of letters correct, while GRBF scores 73.4% letters correct (on a separate test set). From these and other experiments, we conclude that supervised learning of center locations can be very important for radial basis function learning.

artificial intelligence, machine learning, supervised learning, (15 more...)

Country:

North America > United States > Oregon (0.29)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Education > Educational Setting (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

II, J. B. Hampshire, Kumar, B. V. K. Vijaya

Shooting Craps in Search of an Optimal Strategy for Training Connectionist Pattern Classifiers

We compare two strategies for training connectionist (as well as nonconnectionist) modelsfor statistical pattern recognition. The probabilistic strategy is based on the notion that Bayesian discrimination (i.e.- optimal classification) isachieved when the classifier learns the a posteriori class distributions of the random feature vector. The differential strategy is based on the notion that the identity of the largest class a posteriori probability of the feature vector is all that is needed to achieve Bayesian discrimination. Each strategy is directly linked to a family ofobjective functions that can be used in the supervised training procedure. We prove that the probabilistic strategy - linked with error measure objective functions such as mean-squared-error and cross-entropy - typically used to train classifiers necessarily requires larger training sets and more complex classifier architectures than those needed to approximate the Bayesian discriminant function.In contrast.

artificial intelligence, machine learning, pattern recognition, (18 more...)

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.57)

Bridle, John S., Heading, Anthony J. R., MacKay, David J. C.

Unsupervised Classifiers, Mutual Information and 'Phantom Targets

We derive criteria for training adaptive classifier networks to perform unsupervised dataanalysis. The first criterion turns a simple Gaussian classifier into a simple Gaussian mixture analyser. The second criterion, which is much more generally applicable, is based on mutual information.

artificial intelligence, classifier, machine learning, (16 more...)

Country: North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)