Statistical Learning
Text-Based Information Retrieval Using Exponentiated Gradient Descent
Papka, Ron, Callan, James P., Barto, Andrew G.
The following investigates the use of single-neuron learning algorithms to improve the performance of text-retrieval systems that accept natural-language queries. A retrieval process is explained that transforms the natural-language query into the query syntax of a real retrieval system: the initial query is expanded using statistical and learning techniques and is then used for document ranking and binary classification. The results of experiments suggest that Kivinen and Warmuth's Exponentiated Gradient Descent learning algorithm works significantly better than previous approaches. 1 Introduction The following work explores two learning algorithms - Least Mean Squared (LMS) [1] and Exponentiated Gradient Descent (EG) [2] - in the context of text-based Information Retrieval (IR) systems. The experiments presented in [3] use connectionist to improve the retrieval of relevant documents from a largelearning models collection of text. Previous the area employs various techniques for improving retrieval [6, 7, 14].
Clustering via Concave Minimization
Bradley, Paul S., Mangasarian, Olvi L., Street, W. Nick
If a polyhedral distance is used, the problem can be formulated as that of minimizing a piecewise-linear concave function on a polyhedral set which is shown to be equivalent to a bilinear program: minimizing a bilinear function on a polyhedral set.A fast finite k-Median Algorithm consisting of solving few linear programs in closed form leads to a stationary point of the bilinear program. Computational testing on a number of realworld databaseswas carried out. On the Wisconsin Diagnostic Breast Cancer (WDBC) database, k-Median training set correctness wascomparable to that of the k-Mean Algorithm, however its testing set correctness was better. Additionally, on the Wisconsin Prognostic Breast Cancer (WPBC) database, distinct and clinically importantsurvival curves were extracted by the k-Median Algorithm, whereas the k-Mean Algorithm failed to obtain such distinct survival curves for the same database.
Ordered Classes and Incomplete Examples in Classification
The classes in classification tasks often have a natural ordering, and the training and testing examples are often incomplete. We propose a nonlinear ordinalmodel for classification into ordered classes. Predictive, simulation-based approaches are used to learn from past and classify future incompleteexamples. These techniques are illustrated by making prognoses for patients who have suffered severe head injuries.
Combinations of Weak Classifiers
To obtain classification systems with both good generalization performance andefficiency in space and time, we propose a learning method based on combinations of weak classifiers, where weak classifiers arelinear classifiers (perceptrons) which can do a little better than making random guesses. A randomized algorithm is proposed to find the weak classifiers. Theyยท are then combined through a majority vote.As demonstrated through systematic experiments, the method developed is able to obtain combinations of weak classifiers with good generalization performance and a fast training time on a variety of test problems and real applications.
On a Modification to the Mean Field EM Algorithm in Factorial Learning
Dunmur, A. P., Titterington, D. M.
A modification is described to the use of mean field approximations inthe E step of EM algorithms for analysing data from latent structure models, as described by Ghahramani (1995), among others. Themodification involves second-order Taylor approximations to expectations computed in the E step. The potential benefits of the method are illustrated using very simple latent profile models. 1 Introduction Ghahramani (1995) advocated the use of mean field methods as a means to avoid the heavy computation involved in the E step of the EM algorithm used for estimating parameters within a certain latent structure model, and Ghahramani & Jordan (1995) used the same ideas in a more complex situation. Dunmur & Titterington (1996a) identified Ghahramani's model as a so-called latent profile model, they observed that Zhang (1992,1993) had used mean field methods for a similar purpose, and they showed, in a simulation study based on very simple examples, that the mean field version of the EM algorithm often performed very respectably. By this it is meant that, when data were generated from the model under analysis, the estimators of the underlying parameters were efficient, judging by empirical results, especially in comparison with estimators obtained by employing the'correct' EM algorithm: the examples therefore had to be simple enough that the correct EM algorithm is numerically feasible, although any success reported for the mean field 432 A. P. Dunmur and D. M. Titterington version is, one hopes, an indication that the method will also be adequate in more complex situations in which the correct EM algorithm is not implementable because of computational complexity. In spite of the above positive remarks, there were circumstances in which there was a perceptible, if not dramatic, lack of efficiency in the simple (naive) mean field estimators, and the objective of this contribution is to propose and investigate ways of refining the method so as to improve performance without detracting from the appealing, and frequently essential, simplicity of the approach. The procedure used here is based on a second order correction to the naive mean field well known in statistical physics and sometimes called the cavity or TAP method (Mezard, Parisi & Virasoro, 1987). It has been applied recently in cluster analysis (Hofmann & Buhmann, 1996). In Section 2 we introduce the structure of our model, Section 3 explains the refined mean field approach, Section 4 provides numerical results, and Section 5 contains a statement of our conclusions.
Improving the Accuracy and Speed of Support Vector Machines
Burges, Christopher J. C., Schรถlkopf, Bernhard
Support Vector Learning Machines (SVM) are finding application in pattern recognition, regression estimation, and operator inversion forill-posed problems. Against this very general backdrop, any methods for improving the generalization performance, or for improving the speed in test phase, of SVMs are of increasing interest. Inthis paper we combine two such techniques on a pattern recognition problem. The method for improving generalization performance (the"virtual support vector" method) does so by incorporating known invariances of the problem. This method achieves a drop in the error rate on 10,000 NIST test digit images of 1.4% to 1.0%.
Adaptively Growing Hierarchical Mixtures of Experts
Fritsch, Jรผrgen, Finke, Michael, Waibel, Alex
We propose a novel approach to automatically growing and pruning Hierarchical Mixtures of Experts. The constructive algorithm proposed hereenables large hierarchies consisting of several hundred experts to be trained effectively. We show that HME's trained by our automatic growing procedure yield better generalization performance thantraditional static and balanced hierarchies. Evaluation of the algorithm is performed (1) on vowel classification and (2) within a hybrid version of the JANUS r9] speech recognition systemusing a subset of the Switchboard large-vocabulary speaker-independent continuous speech recognition database.
A Constructive RBF Network for Writer Adaptation
This paper discusses a fairly general adaptation algorithm which augments a standard neural network to increase its recognition accuracy fora specific user. The basis for the algorithm is that the output of a neural network is characteristic of the input, even when the output is incorrect. We exploit this characteristic output by using an Output Adaptation Module (OAM) which maps this output intothe correct user-dependent confidence vector. The OAM is a simplified Resource Allocating Network which constructs radial basisfunctions online. We applied the OAM to construct a writer-adaptive character recognition system for online handprinted characters.The OAM decreases the word error rate on a test set by an average of 45%, while creating only 3 to 25 basis functions for each writer in the test set. 1 Introduction One of the major difficulties in creating any statistical pattern recognition system is that the statistics of the training set is often different from the statistics in actual use.
Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning
Model learning combined with dynamic programming has been shown to be effective for learning control of continuous state dynamic systems. The simplest method assumes the learned model is correct and applies dynamic programming to it, but many approximators provide uncertainty estimates on the fit. How can they be exploited? This paper addresses the case where the system must be prevented from having catastrophic failures during learning.We propose a new algorithm adapted from the dual control literature and use Bayesian locally weighted regression models with dynamic programming.A common reinforcement learning assumption is that aggressive exploration should be encouraged. This paper addresses the converse casein which the system has to reign in exploration.
Interpolating Earth-science Data using RBF Networks and Mixtures of Experts
We present a mixture of experts (ME) approach to interpolate sparse, spatially correlated earth-science data. Kriging is an interpolation method which uses a global covariation model estimated from the data to take account of the spatial dependence in the data. Based on the close relationship between kriging and the radial basis function (RBF) network (Wan & Bone, 1996), we use a mixture of generalized RBF networks to partition the input space into statistically correlated regions and learn the local covariation model of the data in each region. Applying the ME approach to simulated and real-world data, we show that it is able to achieve good partitioning of the input space, learn the local covariation models and improve generalization.