Learning in Compositional Hierarchies: Inducing the Structure of Objects from Data

Neural Information Processing Systems

I propose a learning algorithm for learning hierarchical models for object recognition.The model architecture is a compositional hierarchy that represents part-whole relationships: parts are described in the local contextof substructures of the object. The focus of this report is learning hierarchical models from data, i.e. inducing the structure of model prototypes from observed exemplars of an object. At each node in the hierarchy, a probability distribution governing its parameters must be learned. The connections between nodes reflects the structure of the object. The formulation of substructures is encouraged such that their parts become conditionally independent.


Hoo Optimality Criteria for LMS and Backpropagation

Neural Information Processing Systems

This fact provides a theoretical justification of the widely observed excellent robustness properties of the LMS and backpropagation algorithms. We further discuss some implications of these results. 1 Introduction The LMS algorithm was originally conceived as an approximate recursive procedure that solves the following problem (Widrow and Hoff, 1960): given a sequence of n x 1 input column vectors {hd, and a corresponding sequence of desired scalar responses { di


Optimal Brain Surgeon: Extensions and performance comparisons

Neural Information Processing Systems

We extend Optimal Brain Surgeon (OBS) - a second-order method for pruning networks - to allow for general error measures, and explore a reduced computational and storage implementation via a dominant eigenspace decomposition. Simulations on nonlinear, noisy pattern classification problems reveal that OBS does lead to improved generalization, and performs favorably in comparison with Optimal Brain Damage (OBD). We find that the required retraining steps in OBD may lead to inferior generalization, that can be interpreted as due to injecting noise backa result the system. A common technique is to stop training of a largeinto at the minimum validation error. We found that the testnetwork error could be reduced even further by means of OBS (but not OBD) pruning.


Monte Carlo Matrix Inversion and Reinforcement Learning

Neural Information Processing Systems

We describe the relationship between certain reinforcement learning (RL) methods based on dynamic programming (DP) and a class of unorthodox Monte Carlo methods for solving systems of linear equations proposed in the 1950's. These methods recast the solution of the linear system as the expected value of a statistic suitably defined over sample paths of a Markov chain. The significance of our observations lies in arguments (Curtiss, 1954) that these Monte Carlo methods scale better with respect to state-space size than do standard, iterative techniques for solving systems of linear equations. This analysis also establishes convergence rate estimates. Because methods used in RL systems for approximating the evaluation function of a fixed control policy also approximate solutions to systems of linear equations, the connection to these Monte Carlo methods establishes that algorithms very similar to TD algorithms (Sutton, 1988) are asymptotically more efficient in a precise sense than other methods for evaluating policies. Further, all DPbased RL methods have some of the properties of these Monte Carlo algorithms, that although RL is often perceived towhich suggests be slow, for sufficiently large problems, it may in fact be more efficient than other known classes of methods capable of producing the same results.


Analyzing Cross-Connected Networks

Neural Information Processing Systems

The nonlinear complexities of neural networks make network solutions difficult to understand. Sanger's contributionanalysis is here extended to the analysis of networks automatically generated by the cascadecorrelation learning algorithm. Because such networks have cross of hiddenconnections that supersede hidden layers, standard analyses contribution is defined as theunit activation patterns are insufficient. A of an output weight and the associated activation on the sendingproduct unit, whether that sending unit is an input or a hidden unit, multiplied by the sign of the output target for the current input pattern.



Estimating analogical similarity by dot-products of Holographic Reduced Representations

Neural Information Processing Systems

Gentner and Markman (1992) suggested that the ability to deal with analogy will be a "Watershed or Waterloo" for connectionist models. They identified "structural alignment" as the central aspect of analogy making. They noted the apparent ease with which people can perform structural alignment in a wide variety of tasks and were pessimistic about the of a distributed connectionist model that could be useful inprospects for the development performing structural alignment. In this paper I describe how Holographic Reduced Representations (HRRs) (Plate, 1991; Plate, 1994), a fixed-width distributed representation for nested structures, can be used to obtain fast estimates of analogical similarity.




The 1994 Florida AI Research Symposium

AI Magazine

The 1994 Florida AI Research Symposium was held 5-7 May at Pensacola Beach, Florida. This symposium brought together researchers and practitioners in AI, cognitive science, and allied disciplines to discuss timely topics, cutting-edge research, and system development efforts in areas spanning the entire AI field. Symposium highlights included Pat Hayes's comparison of the history of AI to the history of powered flight and Clark Glymour's discussion of the prehistory of AI.