Pattern Recognition
Boosting Decision Trees
Drucker, Harris, Cortes, Corinna
We introduce a constructive, incremental learning system for regression problems that models data by means of locally linear experts. In contrast to other approaches, the experts are trained independently and do not compete for data during learning. Only when a prediction for a query is required do the experts cooperate by blending their individual predictions. Eachexpert is trained by minimizing a penalized local cross validation errorusing second order methods. In this way, an expert is able to find a local distance metric by adjusting the size and shape of the receptive fieldin which its predictions are valid, and also to detect relevant input features by adjusting its bias on the importance of individual input dimensions. We derive asymptotic results for our method. In a variety of simulations the properties of the algorithm are demonstrated with respect to interference, learning speed, prediction accuracy, feature detection, and task oriented incremental learning.
Laterally Interconnected Self-Organizing Maps in Hand-Written Digit Recognition
Choe, Yoonsuck, Sirosh, Joseph, Miikkulainen, Risto
An application of laterally interconnected self-organizing maps (LISSOM) to handwritten digit recognition is presented. The resulting excitatory connections focus the activity into local patches and the inhibitory connections decorrelate redundant activityon the map. The map thus forms internal representations thatare easy to recognize with e.g. a perceptron network. The recognition rate on a subset of NIST database 3 is 4.0% higher with LISSOM than with a regular Self-Organizing Map (SOM) as the front end, and 15.8% higher than recognition of raw input bitmaps directly. These results form a promising starting point for building pattern recognition systems with a LISSOM map as a front end. 1 Introduction Handwritten digit recognition has become one of the touchstone problems in neural networks recently.
From Data Mining to Knowledge Discovery in Databases
Fayyad, Usama, Piatetsky-Shapiro, Gregory, Smyth, Padhraic
Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particular real-world applications, specific data-mining techniques, challenges involved in real-world applications of knowledge discovery, and current and future research directions in the field.
Mean Field Theory for Sigmoid Belief Networks
Saul, L. K., Jaakkola, T., Jordan, M. I.
We develop a mean field theory for sigmoid belief networks based on ideas from statistical mechanics. Our mean field theory provides a tractable approximation to the true probability distribution in these networks; it also yields a lower bound on the likelihood of evidence. We demonstrate the utility of this framework on a benchmark problem in statistical pattern recognition---the classification of handwritten digits.
Combining Estimators Using Non-Constant Weighting Functions
Tresp, Volker, Taniguchi, Michiaki
This paper discusses the linearly weighted combination of estimators in which the weighting functions are dependent on the input. We show that the weighting functions can be derived either by evaluating the input dependent variance of each estimator or by estimating how likely it is that a given estimator has seen data in the region of the input space close to the input pattern. The latter solution is closely related to the mixture of experts approach and we show how learning rules for the mixture of experts can be derived from the theory about learning with missing features. The presented approaches are modular since the weighting functions can easily be modified (no retraining) if more estimators are added. Furthermore, it is easy to incorporate estimators which were not derived from data such as expert systems or algorithms.
Coarse-to-Fine Image Search Using Neural Networks
Spence, Clay, Pearson, John C., Bergen, Jim
The efficiency of image search can be greatly improved by using a coarse-to-fine search strategy with a multi-resolution image representation. However, if the resolution is so low that the objects have few distinguishing features, search becomes difficult. We show that the performance of search at such low resolutions can be improved by using context information, i.e., objects visible at low-resolution which are not the objects of interest but are associated with them. The networks can be given explicit context information as inputs, or they can learn to detect the context objects, in which case the user does not have to be aware of their existence. We also use Integrated Feature Pyramids, which represent high-frequency information at low resolutions. The use of multiresolution search techniques allows us to combine information about the appearance of the objects on many scales in an efficient way. A natural fOlm of exemplar selection also arises from these techniques. We illustrate these ideas by training hierarchical systems of neural networks to find clusters of buildings in aerial photographs of farmland.
From Data Distributions to Regularization in Invariant Learning
Ideally pattern recognition machines provide constant output when the inputs are transformed under a group 9 of desired invariances. These invariances can be achieved by enhancing the training data to include examples of inputs transformed by elements of g, while leaving the corresponding targets unchanged. Alternatively the cost function for training can include a regularization term that penalizes changes in the output when the input is transformed under the group. This paper relates the two approaches, showing precisely the sense in which the regularized cost function approximates the result of adding transformed (or distorted) examples to the training data. The cost function for the enhanced training set is equivalent to the sum of the original cost function plus a regularizer. For unbiased models, the regularizer reduces to the intuitively obvious choice - a term that penalizes changes in the output when the inputs are transformed under the group. For infinitesimal transformations, the coefficient of the regularization term reduces to the variance of the distortions introduced into the training data. This correspondence provides a simple bridge between the two approaches.
Coarse-to-Fine Image Search Using Neural Networks
Spence, Clay, Pearson, John C., Bergen, Jim
The efficiency of image search can be greatly improved by using a coarse-to-fine search strategy with a multi-resolution image representation. However, if the resolution is so low that the objects have few distinguishing features, search becomes difficult. We show that the performance of search at such low resolutions can be improved by using context information, i.e., objects visible at low-resolution which are not the objects of interest but are associated with them. The networks can be given explicit context information as inputs, or they can learn to detect the context objects, in which case the user does not have to be aware of their existence. We also use Integrated Feature Pyramids, which represent high-frequency information at low resolutions. The use of multiresolution search techniques allows us to combine information about the appearance of the objects on many scales in an efficient way. A natural fOlm of exemplar selection also arises from these techniques. We illustrate these ideas by training hierarchical systems of neural networks to find clusters of buildings in aerial photographs of farmland.
Combining Estimators Using Non-Constant Weighting Functions
Tresp, Volker, Taniguchi, Michiaki
This paper discusses the linearly weighted combination of estimators in which the weighting functions are dependent on the input. We show that the weighting functions can be derived either by evaluating the input dependent variance of each estimator or by estimating how likely it is that a given estimator has seen data in the region of the input space close to the input pattern. The latter solution is closely related to the mixture of experts approach and we show how learning rules for the mixture of experts can be derived from the theory about learning with missing features. The presented approaches are modular since the weighting functions can easily be modified (no retraining) if more estimators are added. Furthermore, it is easy to incorporate estimators which were not derived from data such as expert systems or algorithms.
From Data Distributions to Regularization in Invariant Learning
Ideally pattern recognition machines provide constant output when the inputs are transformed under a group 9 of desired invariances. These invariances can be achieved by enhancing the training data to include examples of inputs transformed by elements of g, while leaving the corresponding targets unchanged. Alternatively the cost function for training can include a regularization term that penalizes changes in the output when the input is transformed under the group. This paper relates the two approaches, showing precisely the sense in which the regularized cost function approximates the result of adding transformed (or distorted) examples to the training data. The cost function for the enhanced training set is equivalent to the sum of the original cost function plus a regularizer. For unbiased models, the regularizer reduces to the intuitively obvious choice - a term that penalizes changes in the output when the inputs are transformed under the group. For infinitesimal transformations, the coefficient of the regularization term reduces to the variance of the distortions introduced into the training data. This correspondence provides a simple bridge between the two approaches.