Bayesian Inference
The g Factor: Relating Distributions on Features to Distributions on Images
We describe the g-factor, which relates probability distributions on image features to distributions on the images themselves. The g-factor depends only on our choice of features and lattice quanti(cid:173) zation and is independent of the training image data. We illustrate the importance of the g-factor by analyzing how the parameters of Markov Random Field (i.e. Gibbs or log-linear) probability models of images are learned from data by maximum likelihood estimation. In particular, we study homogeneous MRF models which learn im(cid:173) age distributions in terms of clique potentials corresponding to fea(cid:173) ture histogram statistics (d.
MIME: Mutual Information Minimization and Entropy Maximization for Bayesian Belief Propagation
Bayesian belief propagation in graphical models has been recently shown to have very close ties to inference methods based in statis- tical physics. After Yedidia et al. demonstrated that belief prop- agation (cid:12)xed points correspond to extrema of the so-called Bethe free energy, Yuille derived a double loop algorithm that is guar- anteed to converge to a local minimum of the Bethe free energy. Yuille's algorithm is based on a certain decomposition of the Bethe free energy and he mentions that other decompositions are possi- ble and may even be fruitful. In the present work, we begin with the Bethe free energy and show that it has a principled interpre- tation as pairwise mutual information minimization and marginal entropy maximization (MIME). Next, we construct a family of free energy functions from a spectrum of decompositions of the original Bethe free energy.
A Bayesian Model Predicts Human Parse Preference and Reading Times in Sentence Processing
Narayanan and Jurafsky (1998) proposed that human language compre- hension can be modeled by treating human comprehenders as Bayesian reasoners, and modeling the comprehension process with Bayesian de- cision trees. In this paper we extend the Narayanan and Jurafsky model to make further predictions about reading time given the probability of difference parses or interpretations, and test the model against reading time data from a psycholinguistic experiment.
Causal Categorization with Bayes Nets
A theory of categorization is presented in which knowledge of causal relationships between category features is represented as a Bayesian network. Referred to as causal-model theory, this theory predicts that objects are classified as category members to the extent they are likely to have been produced by a categorys causal model. On this view, people have models of the world that lead them to expect a certain distribution of features in category members (e.g., correlations between feature pairs that are directly connected by causal relationships), and consider exemplars good category members when they manifest those expectations. These expectations include sensitivity to higher-order feature interactions that emerge from the asymmetries inherent in causal relationships. Research on the topic of categorization has traditionally focused on the problem of learning new categories given observations of category members.
Sequential Noise Compensation by Sequential Monte Carlo Method
We present a sequential Monte Carlo method applied to additive noise compensation for robust speech recognition in time-varying noise. The method generates a set of samples according to the prior distribution given by clean speech models and noise prior evolved from previous estimation. An explicit model representing noise ef- fects on speech features is used, so that an extended Kalman filter is constructed for each sample, generating the updated continuous state estimate as the estimation of the noise parameter, and predic- tion likelihood for weighting each sample. Minimum mean square error (MMSE) inference of the time-varying noise parameter is car- ried out over these samples by fusion the estimation of samples ac- cording to their weights. A residual resampling selection step and a Metropolis-Hastings smoothing step are used to improve calcula- tion e#ciency. Experiments were conducted on speech recognition in simulated non-stationary noises, where noise power changed ar- tificially, and highly non-stationary Machinegun noise.
Neural Implementation of Bayesian Inference in Population Codes
This study investigates a population decoding paradigm, in which the estimation of stimulus in the previous step is used as prior knowledge for consecutive decoding. We analyze the decoding accu(cid:173) racy of such a Bayesian decoder (Maximum a Posteriori Estimate), and show that it can be implemented by a biologically plausible recurrent network, where the prior knowledge of stimulus is con(cid:173) veyed by the change in recurrent interactions as a result of Hebbian learning.
Boosting and Maximum Likelihood for Exponential Models
We derive an equivalence between AdaBoost and the dual of a convex optimization problem, showing that the only difference between mini- mizing the exponential loss used by AdaBoost and maximum likelihood for exponential models is that the latter requires the model to be normal- ized to form a conditional probability distribution over labels. In addi- tion to establishing a simple and easily understood connection between the two methods, this framework enables us to derive new regularization procedures for boosting that directly correspond to penalized maximum likelihood. Experiments on UCI datasets support our theoretical analy- sis and give additional insight into the relationship between boosting and logistic regression.
A Bayesian Network for Real-Time Musical Accompaniment
We describe a computer system that provides a real-time musi(cid:173) cal accompaniment for a live soloist in a piece of non-improvised music for soloist and accompaniment. A Bayesian network is devel(cid:173) oped that represents the joint distribution on the times at which the solo and accompaniment notes are played, relating the two parts through a layer of hidden variables. The network is first con(cid:173) structed using the rhythmic information contained in the musical score. The network is then trained to capture the musical interpre(cid:173) tations of the soloist and accompanist in an off-line rehearsal phase. During live accompaniment the learned distribution of the network is combined with a real-time analysis of the soloist's acoustic sig(cid:173) nal, performed with a hidden Markov model, to generate a musi(cid:173) cally principled accompaniment that respects all available sources of knowledge.
Probabilistic Inference of Hand Motion from Neural Activity in Motor Cortex
Statistical learning and probabilistic inference techniques are used to in- fer the hand position of a subject from multi-electrode recordings of neu- ral activity in motor cortex. First, an array of electrodes provides train- ing data of neural firing conditioned on hand kinematics. We learn a non- parametric representation of this firing activity using a Bayesian model and rigorously compare it with previous models using cross-validation. Second, we infer a posterior probability distribution over hand motion conditioned on a sequence of neural test data using Bayesian inference. The learned firing models of multiple cells are used to define a non- Gaussian likelihood term which is combined with a prior probability for the kinematics.
Analysis of Sparse Bayesian Learning
The recent introduction of the'relevance vector machine' has effec(cid:173) tively demonstrated how sparsity may be obtained in generalised linear models within a Bayesian framework. Using a particular form of Gaussian parameter prior, 'learning' is the maximisation, with respect to hyperparameters, of the marginal likelihood of the data. This paper studies the properties of that objective func(cid:173) tion, and demonstrates that conditioned on an individual hyper(cid:173) parameter, the marginal likelihood has a unique maximum which is computable in closed form. It is further shown that if a derived'sparsity criterion' is satisfied, this maximum is exactly equivalent to'pruning' the corresponding parameter from the model.