Bayesian Inference
Local Sampling-based Planning with Sequential Bayesian Updates
Lai, Tin, Morere, Philippe, Ramos, Fabio, Francis, Gilad
Sampling-based planners are the predominant motion planning paradigm for robots. Majority of sampling-based planners use a global random sampling scheme to guarantee completeness. However, these schemes are sample inefficient as the majority of the samples are wasted in narrow passages. Consequently, information about the local structure is neglected. Local sampling-based motion planners, on the other hand, take sequential decisions of random walks to samples valid trajectories in configuration space. However, current approaches do not adapt their strategies according to the success and failures of past samples. In this work, we introduce a local sampling-based motion planner with a Bayesian update scheme for modelling a sampling proposal distribution. The proposal distribution is sequentially updated based on previous sample outcomes, consequently shaping the proposal distribution according to local obstacles and constraints in the configuration space. Thus, through learning from past observed outcomes, we can maximise the likelihood of sampling in regions that have a higher probability to form trajectories within narrow passages.
Bayesian Machine Learning
In the previous post we have learnt about the importance of Latent Variables in Bayesian modelling. Now starting from this post, we will see Bayesian in action. We will walk through different aspects of machine learning and see how Bayesian methods will help us in designing the solutions. And also the additional capabilities and insights we can have by using it. The sections which follows are generally known as Bayesian inference.
Consequences of Model Misspecification for Maximum Likelihood Estimation with Missing Data
Researchers are often faced with the challenge of developing statistical models with incomplete data. Exacerbating this situation is the possibility that either the researcher's complete-data model or the model of the missing-data mechanism is misspecified. In this article, we create a formal theoretical framework for developing statistical models and detecting model misspecification in the presence of incomplete data where maximum likelihood estimates are obtained by maximizing the observable-data likelihood function when the missing-data mechanism is assumed ignorable. First, we provide sufficient regularity conditions on the researcher's complete-data model to characterize the asymptotic behavior of maximum likelihood estimates in the simultaneous presence of both missing data and model misspecification. These results are then used to derive robust hypothesis testing methods for possibly misspecified models in the presence of Missing at Random (MAR) or Missing Not at Random (MNAR) missing data.
@Bayes' Theorem For Bae
Bayes' Theorem is something that confuses and frustrates many, but is not as awful as many make it out to be. While the formula for "Bae's Theorem" given in the graphic above is silly, doesn't make mathematical sense, and borders on being NSFW, it does help illustrate what the problem statement is (something that throws many, as intuitively it seems kind of backwards). Given that Netflix is occurring, one would want to know the probability of'chill', NOT the other way around. Granted, the right side of the equation is complete nonsense, but the left-side is actually a good mnemonic device, especially given that part of the reason so many students tune-out while learning mathematics is due to the dry sterility of the presentation. The theorem essentially states that: the probability of event A given event B is equal to the probability of B given event A times the probability of event A divided by the probability of B. Which seems very complex without breaking it down bit by bit.
A Variational Bayes Approach to Adaptive Radio Tomography
Lee, Donghoon, Giannakis, Georgios B.
Radio tomographic imaging (RTI) is an emerging technology for localization of physical objects in a geographical area covered by wireless networks. With attenuation measurements collected at spatially distributed sensors, RTI capitalizes on spatial loss fields (SLFs) measuring the absorption of radio frequency waves at spatial locations along the propagation path. These SLFs can be utilized for interference management in wireless communication networks, environmental monitoring, and survivor localization after natural disasters such as earthquakes. Key to the success of RTI is to accurately model shadowing as the weighted line integral of the SLF. To learn the SLF exhibiting statistical heterogeneity induced by spatially diverse environments, the present work develops a Bayesian framework entailing a piecewise homogeneous SLF with an underlying hidden Markov random field model. Utilizing variational Bayes techniques, the novel approach yields efficient field estimators at affordable complexity. A data-adaptive sensor selection strategy is also introduced to collect informative measurements for effective reconstruction of the SLF. Numerical tests using synthetic and real datasets demonstrate the capabilities of the proposed approach to radio tomography and channel-gain estimation.
Mixture Probabilistic Principal Geodesic Analysis
Zhang, Youshan, Xing, Jiarui, Zhang, Miaomiao
Dimensionality reduction on Riemannian manifolds is challenging due to the complex nonlinear data structures. While probabilistic principal geodesic analysis~(PPGA) has been proposed to generalize conventional principal component analysis (PCA) onto manifolds, its effectiveness is limited to data with a single modality. In this paper, we present a novel Gaussian latent variable model that provides a unique way to integrate multiple PGA models into a maximum-likelihood framework. This leads to a well-defined mixture model of probabilistic principal geodesic analysis (MPPGA) on sub-populations, where parameters of the principal subspaces are automatically estimated by employing an Expectation Maximization algorithm. We further develop a mixture Bayesian PGA (MBPGA) model that automatically reduces data dimensionality by suppressing irrelevant principal geodesics. We demonstrate the advantages of our model in the contexts of clustering and statistical shape analysis, using synthetic sphere data, real corpus callosum, and mandible data from human brain magnetic resonance~(MR) and CT images.
Accelerated Information Gradient flow
We present a systematic framework for the Nesterov's accelerated gradient flows in the spaces of probabilities embedded with information metrics. Here two metrics are considered, including both the Fisher-Rao metric and the Wasserstein-$2$ metric. For the Wasserstein-$2$ metric case, we prove the convergence properties of the accelerated gradient flows, and introduce their formulations in Gaussian families. Furthermore, we propose a practical discrete-time algorithm in particle implementations with an adaptive restart technique. We formulate a novel bandwidth selection method, which learns the Wasserstein-$2$ gradient direction from Brownian-motion samples. Experimental results including Bayesian inference show the strength of the current method compared with the state-of-the-art.
Meta Learning with Relational Information for Short Sequences
Xie, Yujia, Jiang, Haoming, Liu, Feng, Zhao, Tuo, Zha, Hongyuan
This paper proposes a new meta-learning method -- named HARMLESS (HAwkes Relational Meta LEarning method for Short Sequences) for learning heterogeneous point process models from short event sequence data along with a relational network. Specifically, we propose a hierarchical Bayesian mixture Hawkes process model, which naturally incorporates the relational information among sequences into point process modeling. Compared with existing methods, our model can capture the underlying mixed-community patterns of the relational network, which simultaneously encourages knowledge sharing among sequences and facilitates adaptive learning for each individual sequence. We further propose an efficient stochastic variational meta expectation maximization algorithm that can scale to large problems. Numerical experiments on both synthetic and real data show that HARMLESS outperforms existing methods in terms of predicting the future events.
Learning Concave Conditional Likelihood Models for Improved Analysis of Tandem Mass Spectra
Halloran, John T., Rocke, David M.
The most widely used technology to identify the proteins present in a complex biological sample is tandem mass spectrometry, which quickly produces a large collection of spectra representative of the peptides (i.e., protein subsequences) present in the original sample. In this work, we greatly expand the parameter learning capabilities of a dynamic Bayesian network (DBN) peptide-scoring algorithm, Didea, by deriving emission distributions for which its conditional log-likelihood scoring function remains concave. We show that this class of emission distributions, called Convex Virtual Emissions (CVEs), naturally generalizes the log-sum-exp function while rendering both maximum likelihood estimation and conditional maximum likelihood estimation concave for a wide range of Bayesian networks. Utilizing CVEs in Didea allows efficient learning of a large number of parameters while ensuring global convergence, in stark contrast to Didea's previous parameter learning framework (which could only learn a single parameter using a costly grid search) and other trainable models (which only ensure convergence to local optima). The newly trained scoring function substantially outperforms the state-of-the-art in both scoring function accuracy and downstream Fisher kernel analysis. Furthermore, we significantly improve Didea's runtime performance through successive optimizations to its message passing schedule and derive explicit connections between Didea's new concave score and related MS/MS scoring functions.
Distributionally Robust Language Modeling
Oren, Yonatan, Sagawa, Shiori, Hashimoto, Tatsunori B., Liang, Percy
Language models are generally trained on data spanning a wide range of topics (e.g., news, reviews, fiction), but they might be applied to an a priori unknown target distribution (e.g., restaurant reviews). In this paper, we first show that training on text outside the test distribution can degrade test performance when using standard maximum likelihood (MLE) training. To remedy this without the knowledge of the test distribution, we propose an approach which trains a model that performs well over a wide range of potential test distributions. In particular, we derive a new distributionally robust optimization (DRO) procedure which minimizes the loss of the model over the worst-case mixture of topics with sufficient overlap with the training distribution. Our approach, called topic conditional value at risk (topic CVaR), obtains a 5.5 point perplexity reduction over MLE when the language models are trained on a mixture of Yelp reviews and news and tested only on reviews.