AITopics

Two projection based feedforward network learning methods for modelfree regression problems are studied and compared in this paper: one is the popular back-propagation learning (BPL); the other is the projection pursuit learning (PPL).

neuron, perceptron, ppl, (14 more...)

Country:

North America > United States > Washington > King County > Seattle (0.15)
North America > United States > California > Monterey County > Pacific Grove (0.05)
North America > United States > District of Columbia > Washington (0.04)

Industry: Energy (0.44)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.41)

A Network of Localized Linear Discriminants

Glassman, Martin S.

The localized linear discriminant network (LLDN) has been designed to address classification problems containing relatively closely spaced data from different classes (encounter zones [1], the accuracy problem [2]). Locally trained hyperplane segments are an effective way to define the decision boundaries for these regions [3]. The LLD uses a modified perceptron training algorithm for effective discovery of separating hyperplane/sigmoid units within narrow boundaries. The basic unit of the network is the discriminant receptive field (DRF) which combines the LLD function with Gaussians representing the dispersion of the local training data with respect to the hyperplane. The DRF implements a local distance measure [4], and obtains the benefits of networks oflocalized units [5]. A constructive algorithm for the two-class case is described which incorporates DRF's into the hidden layer to solve local discrimination problems. The output unit produces a smoothed, piecewise linear decision boundary. Preliminary results indicate the ability of the LLDN to efficiently achieve separation when boundaries are narrow and complex, in cases where both the "standard" multilayer perceptron (MLP) and k-nearest neighbor (KNN) yield high error rates on training data. 1 The LLD Training Algorithm and DRF Generation The LLD is defined by the hyperplane normal vector V and its "midpoint" M (a translated origin [1] near the center of gravity of the training data in feature space).

boundary, dispersion, drf, (14 more...)

Country:

North America > United States > New York (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)

Industry: Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Bridle, John S., Heading, Anthony J. R., MacKay, David J. C.

Unsupervised Classifiers, Mutual Information and 'Phantom Targets

We derive criteria for training adaptive classifier networks to perform unsupervised data analysis. The first criterion turns a simple Gaussian classifier into a simple Gaussian mixture analyser. The second criterion, which is much more generally applicable, is based on mutual information.

classifier, mutual information, unsupervised classifier, (13 more...)

Country: North America > United States > California > Los Angeles County > Pasadena (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Data Analysis using G/SPLINES

Rogers, David

G/SPLINES is an algorithm for building functional models of data. It uses genetic search to discover combinations of basis functions which are then used to build a least-squares regression model. Because it produces a population of models which evolve over time rather than a single model, it allows analysis not possible with other regression-based approaches. 1 INTRODUCTION G/SPLINES is a hybrid of Friedman's Multivariable Adaptive Regression Splines (MARS) algorithm (Friedman, 1990) with Holland's Genetic Algorithm (Holland, 1975). G/SPLINES has advantages over MARS in that it requires fewer least-squares computations, is easily extendable to non-spline basis functions, may discover models inaccessible to local-variable selection algorithms, and allows significantly larger problems to be considered. These issues are discussed in (Rogers, 1991). This paper begins with a discussion of linear regression models, followed by a description of the G/SPLINES algorithm, and finishes with a series of experiments illustrating its performance, robustness, and analysis capabilities.

algorithm, basis function, opération, (14 more...)

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > California > Santa Clara County > Sunnyvale (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Industry: Government (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Node Splitting: A Constructive Algorithm for Feed-Forward Neural Networks

Wynne-Jones, Mike

A constructive algorithm is proposed for feed-forward neural networks, which uses node-splitting in the hidden layers to build large networks from smaller ones. The small network forms an approximate model of a set of training data, and the split creates a larger more powerful network which is initialised with the approximate solution already found. The insufficiency of the smaller network in modelling the system which generated the data leads to oscillation in those hidden nodes whose weight vectors cover regions in the input space where more detail is required in the model. These nodes are identified and split in two using principal component analysis, allowing the new nodes t.o cover the two main modes of each oscillating vector. Nodes are selected for splitting using principal component analysis on the oscillating weight vectors, or by examining the Hessian matrix of second derivatives of the network error with respect to the weight.s.

node, node splitting, variance, (11 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
Europe > United Kingdom (0.05)
North America > United States > California > San Diego County > La Jolla (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.45)

Sanger, Terence D., Sutton, Richard S., Matheus, Christopher J.

Iterative Construction of Sparse Polynomial Approximations

Terence D. Sanger Richard S. Sutton Christopher J. Matheus Massachusetts Institute GTE Laboratories GTE Laboratories of Technology Incorporated Incorporated Room E25-534 40 Sylvan Road 40 Sylvan Road Cambridge, MA 02139 Waltham, MA 02254 Waltham, MA 02254 tds@ai.mit.edu Abstract We present an iterative algorithm for nonlinear regression based on construction of sparse polynomials. Polynomials are built sequentially from lower to higher order. Selection of new terms is accomplished using a novel look-ahead approach that predicts whether a variable contributes to the remaining error. The algorithm is based on the tree-growing heuristic in LMS Trees which we have extended to approximation of arbitrary polynomials of the input features.

algorithm, polynomial, regression, (12 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.54)
North America > United States > Massachusetts > Middlesex County > Waltham (0.45)
North America > United States > New York (0.04)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)

Merging Constrained Optimisation with Deterministic Annealing to "Solve" Combinatorially Hard Problems

Stolorz, Paul

Several parallel analogue algorithms, based upon mean field theory (MFT) approximations to an underlying statistical mechanics formulation, and requiring an externally prescribed annealing schedule, now exist for finding approximate solutions to difficult combinatorial optimisation problems. They have been applied to the Travelling Salesman Problem (TSP), as well as to various issues in computational vision and cluster analysis. I show here that any given MFT algorithm can be combined in a natural way with notions from the areas of constrained optimisation and adaptive simulated annealing to yield a single homogenous and efficient parallel relaxation technique, for which an externally prescribed annealing schedule is no longer required. The results of numerical simulations on 50-city and 100-city TSP problems are presented, which show that the ensuing algorithms are typically an order of magnitude faster than the MFT algorithms alone, and which also show, on occasion, superior solutions as well. 1 INTRODUCTION Several promising parallel analogue algorithms, which can be loosely described by the term "deterministic annealing", or "mean field theory (MFT) annealing", have *also at Theoretical Division and Center for Nonlinear Studies, MSB213, Los Alamos National Laboratory, Los Alamos, NM 87545.

algorithm, merging constrained optimisation, procedure, (12 more...)

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.45)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.56)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Darken, Christian, Moody, John

Towards Faster Stochastic Gradient Search

Stochastic gradient descent is a general algorithm which includes LMS, online backpropagation, and adaptive k-means clustering as special cases.

converge, convergence, gradient descent, (12 more...)

Country:

North America > United States > California (0.14)
North America > United States > Connecticut > New Haven County > New Haven (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.95)

Jordan, Michael I., Jacobs, Robert A.

Hierarchies of adaptive experts

Another class of nonlinear algorithms, exemplified by CART (Breiman, Friedman, Olshen, & Stone, 1984) and MARS (Friedman, 1990), generalizes classical techniques by partitioning the training data into non-overlapping regions and fitting separate models in each of the regions. These two classes of algorithms extend linear techniques in essentially independent directions, thus it seems worthwhile to investigate algorithms that incorporate aspects of both approaches to model estimation. Such algorithms would be related to CART and MARS as multilayer neural networks are related to linear statistical techniques. In this paper we present a candidate for such an algorithm. The algorithm that we present partitions its training data in the manner of CART or MARS, but it does so in a parallel, online manner that can be described as the stochastic optimization of an appropriate cost functional.

algorithm, architecture, expert network, (15 more...)

Country:

Asia > Middle East > Jordan (0.07)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Iowa > Story County > Ames (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Best-First Model Merging for Dynamic Learning and Recognition

Omohundro, Stephen M.

"Best-first model merging" is a general technique for dynamically choosing the structure of a neural or related architecture while avoiding overfitting. It is applicable to both leaming and recognition tasks and often generalizes significantly better than fixed structures. We demonstrate the approach applied to the tasks of choosing radial basis functions for function learning, choosing local affine models for curve and constraint surface modelling, and choosing the structure of a balltree or bumptree to maximize efficiency of access.

best-first model merging, complex model, dynamic learning, (14 more...)

Country:

North America > United States > New York (0.05)
North America > United States > California > San Mateo County > San Mateo (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
(2 more...)