Machine Learning
On Neural Networks with Minimal Weights
Bohossian, Vasken, Bruck, Jehoshua
A linear threshold element computes a function that is a sign of a weighted sum of the input variables. The weights are arbitrary integers; actually, they can be very big integers-exponential in the number of the input variables. However, in practice, it is difficult to implement big weights. In the present literature a distinction is made between the two extreme cases: linear threshold functions with polynomial-size weights as opposed to those with exponential-size weights. The main contribution of this paper is to fill up the gap by further refining that separation.
Using the Future to "Sort Out" the Present: Rankprop and Multitask Learning for Medical Risk Evaluation
Caruana, Rich, Baluja, Shumeet, Mitchell, Tom
This paper presents two methods that can improve generalization on a broad class of problems. This class includes identifying low risk pneumonia patients. The first method, rankprop, tries to learn simple models that support ranking future cases while simultaneously learning to rank the training set. The second, multitask learning, uses lab tests available only during training, as additional target values to bias learning towards a more predictive hidden layer. Experiments using a database of pneumonia patients indicate that together these methods outperform standard backpropagation by 10-50%. Rankprop and MTL are applicable to a large class of problems in which the goal is to learn a relative ranking over the instance space, and where the training data includes features that will not be available at run time. Such problems include identifying higher-risk medical patients as early as possible, identifying lower-risk financial investments, and visual analysis of scenes that become easier to analyze as they are approached in the future. Acknowledgements We thank Greg Cooper, Michael Fine, and other members of the Pitt/CMU Cost-Effective Health Care group for help with the Medis Database. This work was supported by ARPA grant F33615-93-1-1330, NSF grant BES-9315428, Agency for Health Care Policy and Research grant HS06468, and an NSF Graduate Student Fellowship (Baluja).
Primitive Manipulation Learning with Connectionism
Infants' manipulative exploratory behavior within the environment is a vehicle of cognitive stimulation[McCall 1974]. During this time, infants practice and perfect sensorimotor patterns that become behavioral moduleswhich will be seriated and imbedded in more complex actions. This paper explores the development of such primitive learning systems using an embodied lightweight hand which will be used for a humanoid being developed at the MIT Artificial Intelligence Laboratory[Brooksand Stein 1993]. Primitive grasping procedures are learned from sensory inputs using a connectionist reinforcement algorithm while two submodules preprocess sensory data to recognize the hardness of objects and detect shear using competitive learning and back-propagation algorithm strategies, respectively. This system is not only consistent and quick during theinitial learning stage, but also adaptable to new situations after training is completed.
Finite State Automata that Recurrent Cascade-Correlation Cannot Represent
This paper relates the computational power of Fahlman' s Recurrent Cascade Correlation (RCC) architecture to that of fInite state automata (FSA). While some recurrent networks are FSA equivalent, RCC is not. The paper presents a theoretical analysis of the RCC architecture in the form of a proof describing a large class of FSA which cannot be realized by RCC. 1 INTRODUCTION Recurrent networks can be considered to be defmed by two components: a network architecture, and a learning rule. The former describes how a network with a given set of weights and topology computes its output values, while the latter describes how the weights (and possibly topology) of the network are updated to fIt a specifIc problem. It is possible to evaluate the computational power of a network architecture by analyzing the types of computations a network could perform assuming appropriate connection weights (and topology).
Gradient and Hamiltonian Dynamics Applied to Learning in Neural Networks
Howse, James W., Abdallah, Chaouki T., Heileman, Gregory L.
James W. Howse Chaouki T. Abdallah Gregory L. Heileman Department of Electrical and Computer Engineering University of New Mexico Albuquerque, NM 87131 Abstract The process of machine learning can be considered in two stages: model selection and parameter estimation. In this paper a technique is presented for constructing dynamical systems with desired qualitative properties. The approach is based on the fact that an n-dimensional nonlinear dynamical system can be decomposed into one gradient and (n - 1) Hamiltonian systems. Thus,the model selection stage consists of choosing the gradient and Hamiltonian portions appropriately so that a certain behavior is obtainable. To estimate the parameters, a stably convergent learning rule is presented.
A Multiscale Attentional Framework for Relaxation Neural Networks
Tsioutsias, Dimitris I., Mjolsness, Eric
Many practical problems in computer vision, pattern recognition, robotics and other areas can be described in terms of constrained optimization. In the past decade, researchers have proposed means of solving such problems with the use of neural networks [Hopfield & Tank, 1985; Koch et ai., 1986], which are thus derived as relaxation dynamics for the objective functions codifying the optimization task. One disturbing aspect of the approach soon became obvious, namely the apparent inabilityof the methods to scale up to practical problems, the principal reason being the rapid increase in the number of local minima present in the objectives as the dimension of the problem increases. Moreover most objectives, E(v), are highly nonlinear, non-convex functions of v, and simple techniques (e.g.
Predictive Q-Routing: A Memory-based Reinforcement Learning Approach to Adaptive Traffic Control
Choi, Samuel P. M., Yeung, Dit-Yan
The controllers usually have no or only very little prior knowledge of the environment. While only local communication between controllers is allowed, the controllers must cooperate among themselves to achieve the common, global objective. Finding the optimal routing policy in such a distributed manner is very difficult. Moreover, since the environment is non-stationary, the optimal policy varies with time as a result of changes in network traffic and topology.
Geometry of Early Stopping in Linear Networks
A theory of early stopping as applied to linear models is presented. The backpropagation learning algorithm is modeled as gradient descent in continuous time. Given a training set and a validation set, all weight vectors found by early stopping must lie on a certain quadricsurface, usually an ellipsoid. Given a training set and a candidate early stopping weight vector, all validation sets have least-squares weights lying on a certain plane. This latter fact can be exploited to estimate the probability of stopping at any given point along the trajectory from the initial weight vector to the leastsquares weightsderived from the training set, and to estimate the probability that training goes on indefinitely. The prospects for extending this theory to nonlinear models are discussed.