AITopics

Country:

North America > United States > California (0.28)
North America > Canada > Ontario > Toronto (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Best-First Model Merging for Dynamic Learning and Recognition

Omohundro, Stephen M.

Stephen M. Omohundro International Computer Science Institute 1947 CenteJ' Street, Suite 600 Berkeley, California 94704 Abstract "Best-first model merging" is a general technique for dynamically choosing the structure of a neural or related architecture while avoiding overfitting.It is applicable to both leaming and recognition tasks and often generalizes significantly better than fixed structures. We demonstrate theapproach applied to the tasks of choosing radial basis functions for function learning, choosing local affine models for curve and constraint surface modelling, and choosing the structure of a balltree or bumptree to maximize efficiency of access. 1 TOWARD MORE COGNITIVE LEARNING Standard backpropagation neural networks learn in a way which appears to be quite different fromhuman leaming. Viewed as a cognitive system, a standard network always maintains acomplete model of its domain. This model is mostly wrong initially, but gets gradually better and better as data appears. The net deals with all data in much the same way and has no representation for the strength of evidence behind a certain conclusion. The network architecture is usually chosen before any data is seen and the processing is much the same in the early phases of learning as in the late phases.

Krogh, Anders, Hertz, John A.

A Simple Weight Decay Can Improve Generalization

It has been observed in numerical simulations that a weight decay can improve generalizationin a feed-forward neural network.

artificial intelligence, machine learning, weight decay, (14 more...)

Country:

Europe (0.29)
North America > United States > California > Santa Cruz County > Santa Cruz (0.14)

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Siu, Kai-Yeung, Bruck, Jehoshua

Neural Computing with Small Weights

Kai-Yeung Siu Dept. of Electrical & Computer Engineering University of California, Irvine Irvine, CA 92717 Jehoshua Bruck IBM Research Division Almaden Research Center San Jose, CA 95120-6099 Abstract An important issue in neural computation is the dynamic range of weights in the neural networks. Many experimental results on learning indicate that the weights in the networks can grow prohibitively large with the size of the inputs. Here we address this issue by studying the tradeoffs between the depth and the size of weights in polynomial-size networks of linear threshold elements (LTEs). We show that there is an efficient way of simulating a network of LTEs with large weights by a network of LTEs with small weights. To prove these results, we use tools from harmonic analysis of Boolean functions.

logic & formal reasoning, machine learning, n-bit number, (16 more...)

Country: North America > United States > California > Orange County > Irvine (0.54)

Industry:

Government > Regional Government > North America Government > United States Government (0.69)
Government > Military (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.38)

Zhao, Ying, Atkeson, Christopher G.

Some Approximation Properties of Projection Pursuit Learning Networks

Ying Zhao Christopher G. Atkeson The Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139 Abstract This paper will address an important question in machine learning: What kind of network architectures work better on what kind of problems? A projection pursuit learning network has a very similar structure to a one hidden layer sigmoidal neural network. A general method based on a continuous version of projection pursuit regression is developed to show that projection pursuit regression works better on angular smooth functions thanon Laplacian smooth functions. There exists a ridge function approximation scheme to avoid the curse of dimensionality for approximating functionsin L 2(¢d). 1 INTRODUCTION Projection pursuit is a nonparametric statistical technique to find "interesting" low dimensional projections of high dimensional data sets. It has been used for nonparametric fitting and other data-analytic purposes (Friedman and Stuetzle, 1981, Huber, 1985).

artificial intelligence, machine learning, smooth function, (12 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.24)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)

Ji, Chuanyi, Psaltis, Demetri

The VC-Dimension versus the Statistical Capacity of Multilayer Networks

The former characterizes their "Present Address: Department of Electrical Computer and System Engineering, Rensselaer Polytech Institute, Troy, NY 12180.

artificial intelligence, machine learning, vc-dimension, (16 more...)

Country: North America > United States > New York > Rensselaer County > Troy (0.24)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.50)

Kuh, Anthony, Petsche, Thomas, Rivest, Ronald L.

Incrementally Learning Time-varying Half-planes

For a dichotomy, concept drift means that the classification function changes over time. We want to extend the theoretical analyses of learning to include time-varying concepts; to explore the behavior of current learning algorithms in the face of concept drift; and to devise tracking algorithms to better handle concept drift. In this paper, we briefly describe our theoretical model and then present the results of simulations *kuh@wiliki.eng.hawaii.edu

adversary, artificial intelligence, machine learning, (17 more...)

Country:

North America > United States > Hawaii (0.34)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Freund, Yoav, Haussler, David

Unsupervised learning of distributions on binary vectors using two layer networks

We study a particular type of Boltzmann machine with a bipartite graph structure called a harmonium. Ourinterest is in using such a machine to model a probability distribution on binary input vectors. We analyze the class of probability distributions that can be modeled by such machines.

artificial intelligence, harmonium model, machine learning, (17 more...)

Country: North America > United States > California > Santa Cruz County > Santa Cruz (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.37)

Bertoni, Alberto, Campadelli, Paola, Morpurgo, Anna, Panizza, Sandra

Polynomial Uniform Convergence of Relative Frequencies to Probabilities

We define the concept of polynomial uniform convergence of relative frequencies to probabilities in the distribution-dependent context.

artificial intelligence, machine learning, relative frequency, (12 more...)

Country: Europe > Italy (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.50)

Simard, Patrice, Victorri, Bernard, LeCun, Yann, Denker, John

Tangent Prop - A formalism for specifying selected invariances in an adaptive network

In many machine learning applications, one has access, not only to training data, but also to some high-level a priori knowledge about the desired behavior ofthe system. For example, it is known in advance that the output of a character recognizer should be invariant with respect to small spatial distortionsof the input images (translations, rotations, scale changes, etcetera). We have implemented a scheme that allows a network to learn the derivative ofits outputs with respect to distortion operators of our choosing. This not only reduces the learning time and the amount of training data, but also provides a powerful language for specifying what generalizations we wish the network to perform. 1 INTRODUCTION In machine learning, one very often knows more about the function to be learned than just the training data. An interesting case is when certain directional derivatives ofthe desired function are known at certain points.

artificial intelligence, machine learning, tangent vector, (15 more...)