Statistical Learning
Classifying Hand Gestures with a View-Based Distributed Representation
Darrell, Trevor J., Pentland, Alex P.
We present a method for learning, tracking, and recognizing human hand gestures recorded by a conventional CCD camera without any special gloves or other sensors. A view-based representation is used to model aspects of the hand relevant to the trained gestures, and is found using an unsupervised clustering technique. We use normalized correlation networks, with dynamic time warping in the temporal domain, as a distance function for unsupervised clustering. Views are computed separably for space and time dimensions; the distributed response of the combination of these units characterizes the input data with a low dimensional representation. A supervised classification stage uses labeled outputs of the spatiotemporal units as training data. Our system can correctly classify gestures in real time with a low-cost image processing accelerator.
The "Softmax" Nonlinearity: Derivation Using Statistical Mechanics and Useful Properties as a Multiterminal Analog Circuit Element
Elfadel, I. M., J. L. Wyatt, Jr.
In this paper, we show a reciprocal implementation of the "softmax" nonlinearity that is usually used to enforce local competition between neurons [Peterson, 1989]. We show that the circuit is passive and incrementally passive, and we explicitly compute its content and co-content functions. This circuit adds a new element to the library of the analog circuit designer that can be combined with reciprocal constraint boxes [Harris, 1988] and nonlinear resistive fuses [Harris, 1989] to form fast, analog VLSI optimization networks.
A Learning Analog Neural Network Chip with Continuous-Time Recurrent Dynamics
The recurrent network, containing six continuous-time analog neurons and 42 free parameters (connection strengths and thresholds), is trained to generate time-varying outputs approximating given periodic signals presented to the network. The chip implements a stochastic perturbative algorithm, which observes the error gradient along random directions in the parameter space for error-descent learning. In addition to the integrated learning functions and the generation of pseudo-random perturbations, the chip provides for teacher forcing and long-term storage of the volatile parameters. The network learns a 1 kHz circular trajectory in 100 sec. The chip occupies 2mm x 2mm in a 2JLm CMOS process, and dissipates 1.2 m W. 1 Introduction Exact gradient-descent algorithms for supervised learning in dynamic recurrent networks [1-3] are fairly complex and do not provide for a scalable implementation in a standard 2-D VLSI process. We have implemented a fairly simple and scalable ·Present address: Johns Hopkins University, ECE Dept., Baltimore MD 21218-2686.
A Hybrid Radial Basis Function Neurocomputer and Its Applications
Watkins, Steven S., Chau, Paul M., Tawel, Raoul, Lambrigtsen, Bjorn, Plutowski, Mark
A neurocomputer was implemented using radial basis functions and a combination of analog and digital VLSI circuits. The hybrid system uses custom analog circuits for the input layer and a digital signal processing board for the hidden and output layers. The system combines the advantages of both analog and digital circuits.
Exploiting Chaos to Control the Future
Flake, Gary W., Sun, Guo-Zhen, Lee, Yee-Chun
Recently, Ott, Grebogi and Yorke (OGY) [6] found an effective method to control chaotic systems to unstable fixed points by using only small control forces; however, OGY's method is based on and limited to a linear theory and requires considerable knowledge of the dynamics of the system to be controlled. In this paper we use two radial basis function networks: one as a model of an unknown plant and the other as the controller. The controller is trained with a recurrent learning algorithm to minimize a novel objective function such that the controller can locate an unstable fixed point and drive the system into the fixed point with no a priori knowledge of the system dynamics. Our results indicate that the neural controller offers many advantages over OGY's technique.
A Hodgkin-Huxley Type Neuron Model That Learns Slow Non-Spike Oscillation
Doya, Kenji, Selverston, Allen I., Rowat, Peter F.
A gradient descent algorithm for parameter estimation which is similar to those used for continuous-time recurrent neural networks was derived for Hodgkin-Huxley type neuron models. Using membrane potential trajectories as targets, the parameters (maximal conductances, thresholds and slopes of activation curves, time constants) were successfully estimated. The algorithm was applied to modeling slow non-spike oscillation of an identified neuron in the lobster stomatogastric ganglion. A model with three ionic currents was trained with experimental data. It revealed a novel role of A-current for slow oscillation below -50 mY. 1 INTRODUCTION Conductance-based neuron models, first formulated by Hodgkin and Huxley [10], are commonly used for describing biophysical mechanisms underlying neuronal behavior.
Fool's Gold: Extracting Finite State Machines from Recurrent Network Dynamics
Several recurrent networks have been proposed as representations for the task of formal language learning. After training a recurrent network recognize a formal language or predict the next symbol of a sequence, the next logical step is to understand the information processing carried out by the network. Some researchers have begun to extracting finite state machines from the internal state trajectories of their recurrent networks. This paper describes how sensitivity to initial conditions and discrete measurements can trick these extraction methods to return illusory finite state descriptions.
Solvable Models of Artificial Neural Networks
Solvable models of nonlinear learning machines are proposed, and learning in artificial neural networks is studied based on the theory of ordinary differential equations. A learning algorithm is constructed, by which the optimal parameter can be found without any recursive procedure. The solvable models enable us to analyze the reason why experimental results by the error backpropagation often contradict the statistical learning theory.
Optimal Stopping and Effective Machine Complexity in Learning
Wang, Changfeng, Venkatesh, Santosh S., Judd, J. Stephen
We study tltt' problem of when to stop If'arning a class of feedforward networks - networks with linear outputs I1PUrOIl and fixed input weights - when they are trained with a gradient descent algorithm on a finite number of examples. Under general regularity conditions, it is shown that there a.re in general three distinct phases in the generalization performance in the learning process, and in particular, the network has hetter gt'neralization pPTformance when learning is stopped at a certain time before til(' global miniIl111lu of the empirical error is reachert. A notion of effective size of a machine is rtefil1e i and used to explain the tradeoff betwf'en the complexity of the marhine and the training error ill the learning process. The study leads nat.urally to a network size selection critt'rion, which turns Ol1t to be a generalization of Akaike's Information Criterioll for the It'arning process. It if; shown that stopping Iparning before tiJt' global minimum of the empirical error has the effect of network size splectioll. 1 INTRODUCTION The primary goal of learning in neural nets is to find a network that gives valid generalization. In achieving this goal, a central issue is the tradeoff between the training error and network complexity. This usually reduces to a problem of network size selection, which has drawn much research effort in recent years. Various principles, theories, and intuitions, including Occam's razor, statistical model selection criteria such as Akaike's Information Criterion (AIC) [11 and many others [5, 1, 10,3,111 all quantitatively support the following PAC prescription: between two machines which have the same empirical error, the machine with smaller VC-dimf'nsion generalizes better. However, it is noted that these methods or criteria do not npcpssarily If'ad to optimal (or llearly optimal) generalization performance.