Uncertainty
Using Vocabulary Knowledge in Bayesian Multinomial Estimation
Griffiths, Thomas L., Tenenbaum, Joshua B.
Recent approaches have used uncertainty over the vocabulary of symbols in a multinomial distribution as a means of accounting for sparsity. We present a Bayesian approach that allows weak prior knowledge, in the form of a small set of approximate candidate vocabularies, to be used to dramatically improve the resulting estimates. We demonstrate these improvements in applications to text compression andestimating distributions over words in newsgroup data. 1 Introduction Sparse multinomial distributions arise in many statistical domains, including natural languageprocessing and graphical models. Consequently, a number of approaches toparameter estimation for sparse multinomial distributions have been suggested [3]. These approaches tend to be domain-independent: they make little use of prior knowledge about a specific domain. In many domains where multinomial distributionsare estimated there is often at least weak prior knowledge about' the potential structure of distributions, such as a set of hypotheses about restricted vocabularies from which the symbols might be generated. Such knowledge can be solicited from experts or obtained from unlabeled data. We present a method for Bayesian_parameter estimation in sparse discrete domains that exploits this weak form of prior knowledge to improve estimates over knowledge-free approaches.
Tempo tracking and rhythm quantization by sequential Monte Carlo
Cemgil, Ali Taylan, Kappen, Bert
We present a probabilistic generative model for timing deviations in expressive music. The structure of the proposed model is equivalent to a switching state space model. We formulate twowell known music recognition problems, namely tempo tracking and automatic transcription (rhythm quantization) as filtering andmaximum a posteriori (MAP) state estimation tasks. The inferences are carried out using sequential Monte Carlo integration (particlefiltering) techniques. For this purpose, we have derived a novel Viterbi algorithm for Rao-Blackwellized particle filters, wherea subset of the hidden variables is integrated out.
Unsupervised Learning of Human Motion Models
Song, Yang, Goncalves, Luis, Perona, Pietro
This paper presents an unsupervised learning algorithm that can derive the probabilistic dependence structure of parts of an object (a moving human bodyin our examples) automatically from unlabeled data. The distinguished partof this work is that it is based on unlabeled data, i.e., the training features include both useful foreground parts and background clutter and the correspondence between the parts and detected features are unknown. We use decomposable triangulated graphs to depict the probabilistic independence of parts, but the unsupervised technique is not limited to this type of graph. In the new approach, labeling of the data (part assignments) is taken as hidden variables and the EM algorithm isapplied. A greedy algorithm is developed to select parts and to search for the optimal structure based on the differential entropy of these variables. The success of our algorithm is demonstrated by applying it to generate models of human motion automatically from unlabeled real image sequences.
Learning Body Pose via Specialized Maps
Rosales, Rómer, Sclaroff, Stan
A nonlinear supervised learning model, the Specialized Mappings Architecture (SMA), is described and applied to the estimation of human body pose from monocular images. The SMA consists of several specialized forward mapping functions and an inverse mapping function.Each specialized function maps certain domains of the input space (image features) onto the output space (body pose parameters). The key algorithmic problems faced are those of learning the specialized domains and mapping functions in an optimal way,as well as performing inference given inputs and knowledge of the inverse function. Solutions to these problems employ the EM algorithm and alternating choices of conditional independence assumptions.Performance of the approach is evaluated with synthetic and real video sequences of human motion. 1 Introduction In everyday life, humans can easily estimate body part locations (body pose) from relatively low-resolution images of the projected 3D world (e.g., when viewing a photograph or a video). However, body pose estimation is a very difficult computer vision problem.
Estimating the Reliability of ICA Projections
Meinecke, Frank C., Ziehe, Andreas, Kawanabe, Motoaki, Müller, Klaus-Robert
When applying unsupervised learning techniques like ICA or temporal decorrelation,a key question is whether the discovered projections arereliable. In other words: can we give error bars or can we assess the quality of our separation? We use resampling methods totackle these questions and show experimentally that our proposed variance estimations are strongly correlated to the separation error.We demonstrate that this reliability estimation can be used to choose the appropriate ICA-model, to enhance significantly theseparation performance, and, most important, to mark the components that have a actual physical meaning.
Learning Spike-Based Correlations and Conditional Probabilities in Silicon
Shon, Aaron P., Hsu, David, Diorio, Chris
We have designed and fabricated a VLSI synapse that can learn a conditional probability or correlation between spike-based inputs and feedback signals. The synapse is low power, compact, provides nonvolatile weight storage, and can perform simultaneous multiplication andadaptation. We can calibrate arrays of synapses to ensure uniform adaptation characteristics. Finally, adaptation in our synapse does not necessarily depend on the signals used for computation. Consequently,our synapse can implement learning rules that correlate past and present synaptic activity. We provide analysis andexperimental chip results demonstrating the operation in learning and calibration mode, and show how to use our synapse to implement various learning rules in silicon.
A General Greedy Approximation Algorithm with Applications
Greedy approximation algorithms have been frequently used to obtain sparse solutions to learning problems. In this paper, we present a general greedy algorithm for solving a class of convex optimization problems. We derive a bound on the rate of approximation for this algorithm, and show that our algorithm includes a number of earlier studies as special cases.
Products of Gaussians
Williams, Christopher, Agakov, Felix V., Felderhof, Stephen N.
Agakov System Engineering Research Group Chair of Manufacturing Technology Universitiit Erlangen-Niirnberg 91058 Erlangen, Germany F.Agakov@lft·uni-erlangen.de Stephen N. Felderhof Division of Informatics University of Edinburgh Edinburgh EH1 2QL, UK stephenf@dai.ed.ac.uk Abstract Recently Hinton (1999) has introduced the Products of Experts (PoE) model in which several individual probabilistic models for data are combined to provide an overall model of the data. Below weconsider PoE models in which each expert is a Gaussian. Although the product of Gaussians is also a Gaussian, if each Gaussian hasa simple structure the product can have a richer structure. We examine (1) Products of Gaussian pancakes which give rise to probabilistic Minor Components Analysis, (2) products of I-factor PPCA models and (3) a products of experts construction for an AR(l) process. Recently Hinton (1999) has introduced the Products of Experts (PoE) model in which several individual probabilistic models for data are combined to provide an overall model of the data.
Learning Discriminative Feature Transforms to Low Dimensions in Low Dimentions
The marriage of Renyi entropy with Parzen density estimation has been shown to be a viable tool in learning discriminative feature transforms. However, it suffers from computational complexity proportional to the square of the number of samples in the training data. This sets a practical limit to using large databases. We suggest immediate divorce of the two methods and remarriage of Renyi entropy with a semi-parametric density estimation method, such as a Gaussian Mixture Models (GMM). This allows allof the computation to take place in the low dimensional target space, and it reduces computational complexity proportional to square of the number of components in the mixtures. Furthermore, a convenient extensionto Hidden Markov Models as commonly used in speech recognition becomes possible.