Country
Recognizing Hand-written Digits Using Hierarchical Products of Experts
Mayraz, Guy, Hinton, Geoffrey E.
The product of experts learning procedure [1] can discover a set of stochastic binary features that constitute a nonlinear generative model of handwritten images of digits. The quality of generative models learned in this way can be assessed by learning a separate model for each class of digit and then comparing the unnormalized probabilities of test images under the 10 different class-specific models. To improve discriminative performance, it is helpful to learn a hierarchy of separate models for each digit class. Each model in the hierarchy has one layer of hidden units and the nth level model is trained on data that consists of the activities of the hidden units in the already trained (n - l)th level model. After training, each level produces a separate, unnormalized log probabilty score. With a three-level hierarchy for each of the 10 digit classes, a test image produces 30 scores which can be used as inputs to a supervised, logistic classification network that is trained on separate data. On the MNIST database, our system is comparable with current state-of-the-art discriminative methods, demonstrating that the product of experts learning procedure can produce effective generative models of high-dimensional data. 1 Learning products of stochastic binary experts Hinton [1] describes a learning algorithm for probabilistic generative models that are composed of a number of experts. Each expert specifies a probability distribution over the visible variables and the experts are combined by multiplying these distributions together and renormalizing.
Analysis of Bit Error Probability of Direct-Sequence CDMA Multiuser Demodulators
We analyze the bit error probability of multiuser demodulators for directsequence binary phase-shift-keying (DSIBPSK) CDMA channel with additive gaussian noise. The problem of multiuser demodulation is cast into the finite-temperature decoding problem, and replica analysis is applied to evaluate the performance of the resulting MPM (Marginal Posterior Mode) demodulators, which include the optimal demodulator and the MAP demodulator as special cases. An approximate implementation of demodulators is proposed using analog-valued Hopfield model as a naive mean-field approximation to the MPM demodulators, and its performance is also evaluated by the replica analysis. Results of the performance evaluation shows effectiveness of the optimal demodulator and the mean-field demodulator compared with the conventional one, especially in the cases of small information bit rate and low noise level.
Fast Training of Support Vector Classifiers
Pérez-Cruz, Fernando, Alarcón-Diana, Pedro Luis, Navia-Vázquez, Angel, Artés-Rodríguez, Antonio
In this communication we present a new algorithm for solving Support Vector Classifiers (SVC) with large training data sets. The new algorithm is based on an Iterative Re-Weighted Least Squares procedure which is used to optimize the SVc. Moreover, a novel sample selection strategy for the working set is presented, which randomly chooses the working set among the training samples that do not fulfill the stopping criteria. The validity of both proposals, the optimization procedure and sample selection strategy, is shown by means of computer experiments using well-known data sets.
Automated State Abstraction for Options using the U-Tree Algorithm
Jonsson, Anders, Barto, Andrew G.
Learning a complex task can be significantly facilitated by defining a hierarchy of subtasks. An agent can learn to choose between various temporally abstract actions, each solving an assigned subtask, to accomplish the overall task. In this paper, we study hierarchical learning using the framework of options. We argue that to take full advantage of hierarchical structure, one should perform option-specific state abstraction, and that if this is to scale to larger tasks, state abstraction should be automated. We adapt McCallum's U-Tree algorithm to automatically build option-specific representations of the state feature space, and we illustrate the resulting algorithm using a simple hierarchical task. Results suggest that automated option-specific state abstraction is an attractive approach to making hierarchical learning systems more effective.
Bayesian Video Shot Segmentation
Vasconcelos, Nuno, Lippman, Andrew
Prior knowledge about video structure can be used both as a means to improve the peiformance of content analysis and to extract features that allow semantic classification. We introduce statistical models for two important components of this structure, shot duration and activity, and demonstrate the usefulness of these models by introducing a Bayesian formulation for the shot segmentation problem. The new formulations is shown to extend standard thresholding methods in an adaptive and intuitive way, leading to improved segmentation accuracy.
Learning Switching Linear Models of Human Motion
Pavlovic, Vladimir, Rehg, James M., MacCormick, John
The human figure exhibits complex and rich dynamic behavior that is both nonlinear and time-varying. Effective models of human dynamics can be learned from motion capture data using switching linear dynamic system (SLDS) models. We present results for human motion synthesis, classification, and visual tracking using learned SLDS models. Since exact inference in SLDS is intractable, we present three approximate inference algorithms and compare their performance. In particular, a new variational inference algorithm is obtained by casting the SLDS model as a Dynamic Bayesian Network. Classification experiments show the superiority of SLDS over conventional HMM's for our problem domain.
Improved Output Coding for Classification Using Continuous Relaxation
Output coding is a general method for solving multiclass problems by reducing them to multiple binary classification problems. Previous research on output coding has employed, almost solely, predefined discrete codes. We describe an algorithm that improves the performance of output codes by relaxing them to continuous codes. The relaxation procedure is cast as an optimization problem and is reminiscent of the quadratic program for support vector machines. We describe experiments with the proposed algorithm, comparing it to standard discrete output codes. The experimental results indicate that continuous relaxations of output codes often improve the generalization performance, especially for short codes.
Analysis of Bit Error Probability of Direct-Sequence CDMA Multiuser Demodulators
We analyze the bit error probability of multiuser demodulators for directsequence binary phase-shift-keying (DSIBPSK) CDMA channel with additive gaussian noise. The problem of multiuser demodulation is cast into the finite-temperature decoding problem, and replica analysis is applied to evaluate the performance of the resulting MPM (Marginal Posterior Mode) demodulators, which include the optimal demodulator and the MAP demodulator as special cases. An approximate implementation of demodulators is proposed using analog-valued Hopfield model as a naive mean-field approximation to the MPM demodulators, and its performance is also evaluated by the replica analysis. Results of the performance evaluation shows effectiveness of the optimal demodulator and the mean-field demodulator compared with the conventional one, especially in the cases of small information bit rate and low noise level.
Algorithmic Stability and Generalization Performance
Bousquet, Olivier, Elisseeff, André
Until recently, most of the research in that area has focused on uniform a-priori bounds giving a guarantee that the difference between the training error and the test error is uniformly small for any hypothesis in a given class. These bounds are usually expressed in terms of combinatorial quantities such as VCdimension. In the last few years, researchers have tried to use more refined quantities to either estimate the complexity of the search space (e.g.
APRICODD: Approximate Policy Construction Using Decision Diagrams
St-Aubin, Robert, Hoey, Jesse, Boutilier, Craig
We propose a method of approximate dynamic programming for Markov decision processes (MDPs) using algebraic decision diagrams (ADDs). We produce near-optimal value functions and policies with much lower time and space requirements than exact dynamic programming. Our method reduces the sizes of the intermediate value functions generated during value iteration by replacing the values at the terminals of the ADD with ranges of values. Our method is demonstrated on a class of large MDPs (with up to 34 billion states), and we compare the results with the optimal value functions.