Bayesian Inference
A Primer on Domain Adaptation
Lemberger, Pirmin, Panico, Ivan
Standard supervised machine learning assumes that the distribution of the source samples used to train an algorithm is the same as the one of the target samples on which it is supposed to make predictions. However, as any data scientist will confirm, this is hardly ever the case in practice. The set of statistical and numerical methods that deal with such situations is known as domain adaptation, a field with a long and rich history. The myriad of methods available and the unfortunate lack of a clear and universally accepted terminology can however make the topic rather daunting for the newcomer. Therefore, rather than aiming at completeness, which leads to exhibiting a tedious catalog of methods, this pedagogical review aims at a coherent presentation of four important special cases: (1) \emph{prior shift}, a situation in which training samples were selected according to their labels without any knowledge of their actual distribution in the target, (2) \emph{covariate shift} which deals with a situation where training examples were picked according to their features but with some selection bias, (3) \emph{concept shift} where the dependence of the labels on the features defers between the source and the target, and last but not least (4) \emph{subspace mapping} which deals with a situation where features in the target have been subjected to an unknown distortion with respect to the source features. In each case we first build an intuition, next we provide the appropriate mathematical framework and eventually we describe a practical application.
Bayesian nonparametric shared multi-sequence time series segmentation
Mikheeva, Olga, Kazlauskaite, Ieva, Kjellström, Hedvig, Ek, Carl Henrik
In this paper, we introduce a method for segmenting time series data using tools from Bayesian nonparametrics. We consider the task of temporal segmentation of a set of time series data into representative stationary segments. We use Gaussian process (GP) priors to impose our knowledge about the characteristics of the underlying stationary segments, and use a nonparametric distribution to partition the sequences into such segments, formulated in terms of a prior distribution on segment length. Given the segmentation, the model can be viewed as a variant of a Gaussian mixture model where the mixture components are described using the covariance function of a GP. We demonstrate the effectiveness of our model on synthetic data as well as on real time-series data of heartbeats where the task is to segment the indicative types of beats and to classify the heartbeat recordings into classes that correspond to healthy and abnormal heart sounds.
Feature selection in machine learning: R\'enyi min-entropy vs Shannon entropy
Palamidessi, Catuscia, Romanelli, Marco
Feature selection, in the context of machine learning, is the process of separating the highly predictive feature from those that might be irrelevant or redundant. Information theory has been recognized as a useful concept for this task, as the prediction power stems from the correlation, i.e., the mutual information, between features and labels. Many algorithms for feature selection in the literature have adopted the Shannon-entropy-based mutual information. In this paper, we explore the possibility of using R enyi min-entropy instead. In particular, we propose an algorithm based on a notion of conditional R enyi min-entropy that has been recently adopted in the field of security and privacy, and which is strictly related to the Bayes error. We prove that in general the two approaches are incomparable, in the sense that we show that we can construct datasets on which the R enyi-based algorithm performs better than the corresponding Shannon-based one, and datasets on which the situation is reversed. In practice, however, when considering datasets of real data, it seems that the R enyi-based algorithm tends to outperform the other one. We have effectuate several experiments on the BASE-HOCK, SEMEION, and GISETTE datasets, and in all of them we have indeed observed that the R enyi-based algorithm gives better results.
Heterogeneous Learning from Demonstration
Paleja, Rohan, Gombolay, Matthew
--The development of human-robot systems able to leverage the strengths of both humans and their robotic counterparts has been greatly sought after because of the foreseen, broad-ranging impact across industry and research. We believe the true potential of these systems cannot be reached unless the robot is able to act with a high level of autonomy, reducing the burden of manual tasking or teleoperation. T o achieve this level of autonomy, robots must be able to work fluidly with its human partners, inferring their needs without explicit commands. This inference requires the robot to be able to detect and classify the heterogeneity of its partners. We propose a framework for learning from heterogeneous demonstration based upon Bayesian inference and evaluate a suite of approaches on a real-world dataset of gameplay from StarCraft II. This evaluation provides evidence that our Bayesian approach can outperform conventional methods by up to 12.8 % . 1 Index T erms--Learning from Demonstration; Human-Robot Interaction; Human-Robot T eaming; Deep Learning I.
Estimating Aggregate Properties In Relational Networks With Unobserved Data
Embar, Varun, Srinivasan, Sriram, Getoor, Lise
Aggregate network properties such as cluster cohesion and the number of bridge nodes can be used to glean insights about a network's community structure, spread of influence and the resilience of the network to faults. Efficiently computing network properties when the network is fully observed has received significant attention (Wasserman and Faust 1994; Cook and Holder 2006), however the problem of computing aggregate network properties when there is missing data attributes has received little attention. Computing these properties for networks with missing attributes involves performing inference over the network. Statistical relational learning (SRL) and graph neural networks (GNNs) are two classes of machine learning approaches well suited for inferring missing attributes in a graph. In this paper, we study the effectiveness of these approaches in estimating aggregate properties on networks with missing attributes. We compare two SRL approaches and three GNNs. For these approaches we estimate these properties using point estimates such as MAP and mean. For SRL-based approaches that can infer a joint distribution over the missing attributes, we also estimate these properties as an expectation over the distribution. To compute the expectation tractably for probabilistic soft logic, one of the SRL approaches that we study, we introduce a novel sampling framework. In the experimental evaluation, using three benchmark datasets, we show that SRL-based approaches tend to outperform GNN-based approaches both in computing aggregate properties and predictive accuracy. Specifically, we show that estimating the aggregate properties as an expectation over the joint distribution outperforms point estimates.
Particle-Gibbs Sampling For Bayesian Feature Allocation Models
Bouchard-Côté, Alexandre, Roth, Andrew
Bayesian feature allocation models are a popular tool for modelling data with a combinatorial latent structure. Exact inference in these models is generally intractable and so practitioners typically apply Markov Chain Monte Carlo (MCMC) methods for posterior inference. The most widely used MCMC strategies rely on an element wise Gibbs update of the feature allocation matrix. These element wise updates can be inefficient as features are typically strongly correlated. To overcome this problem we have developed a Gibbs sampler that can update an entire row of the feature allocation matrix in a single move. However, this sampler is impractical for models with a large number of features as the computational complexity scales exponentially in the number of features. We develop a Particle Gibbs sampler that targets the same distribution as the row wise Gibbs updates, but has computational complexity that only grows linearly in the number of features. We compare the performance of our proposed methods to the standard Gibbs sampler using synthetic data from a range of feature allocation models. Our results suggest that row wise updates using the PG methodology can significantly improve the performance of samplers for feature allocation models.
Sparse Semi-supervised Heterogeneous Interbattery Bayesian Analysis
Sevilla-Salcedo, Carlos, Gómez-Verdejo, Vanessa, Olmos, Pablo M.
The Bayesian approach to feature extraction, known as factor analysis (FA), has been widely studied in machine learning to obtain a latent representation of the data. An adequate selection of the probabilities and priors of these bayesian models allows the model to better adapt to the data nature (i.e. heterogeneity, sparsity), obtaining a more representative latent space. The objective of this article is to propose a general FA framework capable of modelling any problem. To do so, we start from the Bayesian Inter-Battery Factor Analysis (BIBFA) model, enhancing it with new functionalities to be able to work with heterogeneous data, include feature selection, and handle missing values as well as semi-supervised problems. The performance of the proposed model, Sparse Semi-supervised Heterogeneous Interbattery Bayesian Analysis (SSHIBA) has been tested on 4 different scenarios to evaluate each one of its novelties, showing not only a great versatility and an interpretability gain, but also outperforming most of the state-of-the-art algorithms.
Deep Bayesian Network for Visual Question Generation
Patro, Badri N., Kurmi, Vinod K., Kumar, Sandeep, Namboodiri, Vinay P.
Generating natural questions from an image is a semantic task that requires using vision and language modalities to learn multimodal representations. Images can have multiple visual and language cues such as places, captions, and tags. In this paper, we propose a principled deep Bayesian learning framework that combines these cues to produce natural questions. We observe that with the addition of more cues and by minimizing uncertainty in the among cues, the Bayesian network becomes more confident. We propose a Minimizing Uncertainty of Mixture of Cues (MUMC), that minimizes uncertainty present in a mixture of cues experts for generating probabilistic questions. This is a Bayesian framework and the results show a remarkable similarity to natural questions as validated by a human study. We observe that with the addition of more cues and by minimizing uncertainty among the cues, the Bayesian framework becomes more confident. Ablation studies of our model indicate that a subset of cues is inferior at this task and hence the principled fusion of cues is preferred. Further, we observe that the proposed approach substantially improves over state-of-the-art benchmarks on the quantitative metrics (BLEU-n, METEOR, ROUGE, and CIDEr). Here we provide project link for Deep Bayesian VQG \url{https://delta-lab-iitk.github.io/BVQG/}
Learning Distributional Programs for Relational Autocompletion
Nitesh, Kumar, Ondrej, Kuzelka, Luc, De Raedt
Relational autocompletion is the problem of automatically filling out some missing fields in a relational database. We tackle this problem within the probabilistic logic programming framework of Distributional Clauses (DC), which supports both discrete and continuous probability distributions. Within this framework, we introduce Dreaml -- an approach to learn both the structure and the parameters of DC programs from databases that may contain missing information. To realize this, Dreaml integrates statistical modeling, distributional clauses with rule learning. The distinguishing features of Dreaml are that it 1) tackles relational autocompletion, 2) learns distributional clauses extended with statistical models, 3) deals with both discrete and continuous distributions, 4) can exploit background knowledge, and 5) uses an expectation-maximization based algorithm to cope with missing data.
Community Detection in Bipartite Networks with Stochastic Blockmodels
Yen, Tzu-Chi, Larremore, Daniel B.
In bipartite networks, community structures are restricted to being disassortative, in that nodes of one type are grouped according to common patterns of connection with nodes of the other type. This makes the stochastic block model (SBM), a highly flexible generative model for networks with block structure, an intuitive choice for bipartite community detection. However, typical formulations of the SBM do not make use of the special structure of bipartite networks. In this work, we introduce a Bayesian nonparametric formulation of the SBM and a corresponding algorithm to efficiently find communities in bipartite networks without overfitting. The biSBM improves community detection results over general SBMs when data are noisy, improves the model resolution limit by a factor of $\sqrt{2}$, and expands our understanding of the complicated optimization landscape associated with community detection tasks. A direct comparison of certain terms of the prior distributions in the biSBM and a related high-resolution hierarchical SBM also reveals a counterintuitive regime of community detection problems, populated by smaller and sparser networks, where non-hierarchical models outperform their more flexible counterpart.