Stork, David G.
A Rapid Graph-based Method for Arbitrary Transformation-Invariant Pattern Classification
Sperduti, Alessandro, Stork, David G.
We present a graph-based method for rapid, accurate search through prototypes for transformation-invariant pattern classification. Ourmethod has in theory the same recognition accuracy as other recent methods based on ''tangent distance" [Simard et al., 1994], since it uses the same categorization rule. Nevertheless ours is significantly faster during classification because far fewer tangent distancesneed be computed. Crucial to the success of our system are 1) a novel graph architecture in which transformation constraints and geometric relationships among prototypes are encoded duringlearning, and 2) an improved graph search criterion, used during classification. These architectural insights are applicable toa wide range of problem domains.
A Rapid Graph-based Method for Arbitrary Transformation-Invariant Pattern Classification
Sperduti, Alessandro, Stork, David G.
We present a graph-based method for rapid, accurate search through prototypes for transformation-invariant pattern classification. Our method has in theory the same recognition accuracy as other recent methods based on ''tangent distance" [Simard et al., 1994], since it uses the same categorization rule. Nevertheless ours is significantly faster during classification because far fewer tangent distances need be computed. Crucial to the success of our system are 1) a novel graph architecture in which transformation constraints and geometric relationships among prototypes are encoded during learning, and 2) an improved graph search criterion, used during classification. These architectural insights are applicable to a wide range of problem domains. Here we demonstrate that on a handwriting recognition task, a basic implementation of our system requires less than half the computation of the Euclidean sorting method. 1 INTRODUCTION In recent years, the crucial issue of incorporating invariances into networks for pattern recognition has received increased attention, most especially due to the work of 666 Alessandro Sperduti, David G. Stork
Digital Boltzmann VLSI for constraint satisfaction and learning
Murray, Michael, Leung, Ming-Tak, Boonyanit, Kan, Kritayakirana, Kong, Burg, James B., Wolff, Gregory J., Watanabe, Tokahiro, Schwartz, Edward, Stork, David G., Peterson, Allen M.
We built a high-speed, digital mean-field Boltzmann chip and SBus board for general problems in constraint satjsfaction and learning. Each chip has 32 neural processors and 4 weight update processors, supporting an arbitrary topology of up to 160 functional neurons. On-chip learning is at a theoretical maximum rate of 3.5 x 108 connection updates/sec;recall is 12000 patterns/sec for typical conditions. The chip's high speed is due to parallel computation of inner products, limited (but adequate) precision for weights and activations (5bits), fast clock (125 MHz), and several design insights.
Lipreading by neural networks: Visual preprocessing, learning, and sensory integration
Wolff, Gregory J., Prasad, K. Venkatesh, Stork, David G., Hennecke, Marcus
Automated speech recognition is notoriously hard, and thus any predictive source of information and constraints that could be incorporated into a computer speech recognition system would be desirable. Humans, especially the hearing impaired, can utilize visual information - "speech reading" - for improved accuracy (Dodd & Campbell, 1987, Sanders & Goodrich, 1971). Speech reading can provide direct information about segments, phonemes, rate, speaker gender and identity, and subtle informationfor segmenting speech from background noise or multiple speakers (De Filippo & Sims, 1988, Green & Miller, 1985). Fundamental support for the use of visual information comes from the complementary natureof the visual and acoustic speech signals. Utterances that are difficult to distinguish acoustically are the easiest to distinguish.
Digital Boltzmann VLSI for constraint satisfaction and learning
Murray, Michael, Leung, Ming-Tak, Boonyanit, Kan, Kritayakirana, Kong, Burg, James B., Wolff, Gregory J., Watanabe, Tokahiro, Schwartz, Edward, Stork, David G., Peterson, Allen M.
We built a high-speed, digital mean-field Boltzmann chip and SBus board for general problems in constraint satjsfaction and learning. Each chip has 32 neural processors and 4 weight update processors, supporting an arbitrary topology of up to 160 functional neurons. On-chip learning is at a theoretical maximum rate of 3.5 x 10
Lipreading by neural networks: Visual preprocessing, learning, and sensory integration
Wolff, Gregory J., Prasad, K. Venkatesh, Stork, David G., Hennecke, Marcus
Automated speech recognition is notoriously hard, and thus any predictive source of information and constraints that could be incorporated into a computer speech recognition system would be desirable. Humans, especially the hearing impaired, can utilize visual information - "speech reading" - for improved accuracy (Dodd & Campbell, 1987, Sanders & Goodrich, 1971). Speech reading can provide direct information about segments, phonemes, rate, speaker gender and identity, and subtle information for segmenting speech from background noise or multiple speakers (De Filippo & Sims, 1988, Green & Miller, 1985). Fundamental support for the use of visual information comes from the complementary nature of the visual and acoustic speech signals. Utterances that are difficult to distinguish acoustically are the easiest to distinguish.
Optimal Brain Surgeon: Extensions and performance comparisons
Hassibi, Babak, Stork, David G., Wolff, Gregory
We extend Optimal Brain Surgeon (OBS) - a second-order method for pruning networks - to allow for general error measures, and explore a reduced computational and storage implementation via a dominant eigenspace decomposition. Simulations on nonlinear, noisy pattern classification problems reveal that OBS does lead to improved generalization, and performs favorably in comparison with Optimal Brain Damage (OBD). We find that the required retraining steps in OBD may lead to inferior generalization, that can be interpreted as due to injecting noise backa result the system. A common technique is to stop training of a largeinto at the minimum validation error. We found that the testnetwork error could be reduced even further by means of OBS (but not OBD) pruning.
Second order derivatives for network pruning: Optimal Brain Surgeon
Hassibi, Babak, Stork, David G.
We investigate the use of information from all second order derivatives of the error function to perfonn network pruning (i.e., removing unimportant weights from a trained network) in order to improve generalization, simplify networks, reduce hardware or storage requirements, increase the speed of further training, and in some cases enable rule extraction. Our method, Optimal Brain Surgeon (OBS), is Significantly better than magnitude-based methods and Optimal Brain Damage [Le Cun, Denker and Sol1a, 1990], which often remove the wrong weights. OBS permits the pruning of more weights than other methods (for the same error on the training set), and thus yields better generalization on test data. Crucial to OBS is a recursion relation for calculating the inverse Hessian matrix HI from training data and structural information of the net. OBS permits a 90%, a 76%, and a 62% reduction in weights over backpropagation with weighL decay on three benchmark MONK's problems [Thrun et aI., 1991]. Of OBS, Optimal Brain Damage, and magnitude-based methods, only OBS deletes the correct weights from a trained XOR network in every case. Finally, whereas Sejnowski and Rosenberg [1987J used 18,000 weights in their NETtalk network, we used OBS to prune a network to just 1560 weights, yielding better generalization.
Second order derivatives for network pruning: Optimal Brain Surgeon
Hassibi, Babak, Stork, David G.
We investigate the use of information from all second order derivatives of the error function to perfonn network pruning (i.e., removing unimportant weights from a trained network) in order to improve generalization, simplify networks, reduce hardware or storage requirements, increase the speed of further training, and in some cases enable rule extraction. Our method, Optimal Brain Surgeon (OBS), is Significantly better than magnitude-based methods and Optimal Brain Damage [Le Cun, Denker and Sol1a, 1990], which often remove the wrong weights. OBS permits the pruning of more weights than other methods (for the same error on the training set), and thus yields better generalization on test data. Crucial to OBS is a recursion relation for calculating the inverse Hessian matrix HI from training data and structural information of the net. OBS permits a 90%, a 76%, and a 62% reduction in weights over backpropagation with weighL decay on three benchmark MONK's problems [Thrun et aI., 1991]. Of OBS, Optimal Brain Damage, and magnitude-based methods, only OBS deletes the correct weights from a trained XOR network in every case. Finally, whereas Sejnowski and Rosenberg [1987J used 18,000 weights in their NETtalk network, we used OBS to prune a network to just 1560 weights, yielding better generalization.