Bayesian Learning

Implementing Naive Bayes for Sentiment Analysis in Python


The Naive Bayes Classifier is a well known machine learning classifier with applications in Natural Language Processing (NLP) and other areas. Despite its simplicity, it is able to achieve above average performance in different tasks like sentiment analysis. Today we will elaborate on the core principles of this model and then implement it in Python. In the end, we will see how well we do on a dataset of 2000 movie reviews. The math behind this model isn't particularly difficult to understand if you are familiar with some of the math notation.

Doubly Robust Bayesian Inference for Non-Stationary Streaming Data with $\beta$-Divergences

Neural Information Processing Systems

We present the very first robust Bayesian Online Changepoint Detection algorithm through General Bayesian Inference (GBI) with $\beta$-divergences. The resulting inference procedure is doubly robust for both the predictive and the changepoint (CP) posterior, with linear time and constant space complexity. We provide a construction for exponential models and demonstrate it on the Bayesian Linear Regression model. In so doing, we make two additional contributions: Firstly, we make GBI scalable using Structural Variational approximations that are exact as $\beta \to 0$. Secondly, we give a principled way of choosing the divergence parameter $\beta$ by minimizing expected predictive loss on-line. Reducing False Discovery Rates of \CPs from up to 99\% to 0\% on real world data, this offers the state of the art.

Implicit Reparameterization Gradients

Neural Information Processing Systems

By providing a simple and efficient way of computing low-variance gradients of continuous random variables, the reparameterization trick has become the technique of choice for training a variety of latent variable models. However, it is not applicable to a number of important continuous distributions. We introduce an alternative approach to computing reparameterization gradients based on implicit differentiation and demonstrate its broader applicability by applying it to Gamma, Beta, Dirichlet, and von Mises distributions, which cannot be used with the classic reparameterization trick. Our experiments show that the proposed approach is faster and more accurate than the existing gradient estimators for these distributions.

Disentangling group and link persistence in Dynamic Stochastic Block models Machine Learning

We study the inference of a model of dynamic networks in which both communities and links keep memory of previous network states. By considering maximum likelihood inference from single snapshot observations of the network, we show that link persistence makes the inference of communities harder, decreasing the detectability threshold, while community persistence tends to make it easier. We analytically show that communities inferred from single network snapshot can share a maximum overlap with the underlying communities of a specific previous instant in time. This leads to time-lagged inference: the identification of past communities rather than present ones. Finally we compute the time lag and propose a corrected algorithm, the Lagged Snapshot Dynamic (LSD) algorithm, for community detection in dynamic networks. We analytically and numerically characterize the detectability transitions of such algorithm as a function of the memory parameters of the model and we make a comparison with a full dynamic inference.

Adams Conditioning and Likelihood Ratio Transfer Mediated Inference Artificial Intelligence

Bayesian inference as applied in a legal setting is about belief transfer and involves a plurality of agents and communication protocols. A forensic expert (FE) may communicate to a trier of fact (TOF) first its value of a certain likelihood ratio with respect to FE's belief state as represented by a probability function on FE's proposition space. Subsequently FE communicates its recently acquired confirmation that a certain evidence proposition is true. Then TOF performs likelihood ratio transfer mediated reasoning thereby revising their own belief state. The logical principles involved in likelihood transfer mediated reasoning are discussed in a setting where probabilistic arithmetic is done within a meadow, and with Adams conditioning placed in a central role.

Bayesian Mean-parameterized Nonnegative Binary Matrix Factorization Machine Learning

Binary data matrices can represent many types of data such as social networks, votes or gene expression. In some cases, the analysis of binary matrices can be tackled with nonnegative matrix factorization (NMF), where the observed data matrix is approximated by the product of two smaller nonnegative matrices. In this context, probabilistic NMF assumes a generative model where the data is usually Bernoulli-distributed. Often, a link function is used to map the factorization to the $[0,1]$ range, ensuring a valid Bernoulli mean parameter. However, link functions have the potential disadvantage to lead to uninterpretable models. Mean-parameterized NMF, on the contrary, overcomes this problem. We propose a unified framework for Bayesian mean-parameterized nonnegative binary matrix factorization models (NBMF). We analyze three models which correspond to three possible constraints that respect the mean-parametrization without the need for link functions. Furthermore, we derive a novel collapsed Gibbs sampler and a collapsed variational algorithm to infer the posterior distribution of the factors. Next, we extend the proposed models to a nonparametric setting where the number of used latent dimensions is automatically driven by the observed data. We analyze the performance of our NBMF methods in multiple datasets for different tasks such as dictionary learning and prediction of missing data. Experiments show that our methods provide similar or superior results than the state of the art, while automatically detecting the number of relevant components.

Bayesian deep neural networks for low-cost neurophysiological markers of Alzheimer's disease severity Machine Learning

As societies around the world are ageing, the number of Alzheimer's disease (AD) patients is rapidly increasing. To date, no low-cost, non-invasive biomarkers have been established to advance the objectivization of AD diagnosis and progression assessment. Here, we utilize Bayesian neural networks to develop a multivariate predictor for AD severity using a wide range of quantitative EEG (QEEG) markers. The Bayesian treatment of neural networks both automatically controls model complexity and provides a predictive distribution over the target function, giving uncertainty bounds for our regression task. It is therefore well suited to clinical neuroscience, where data sets are typically sparse and practitioners require a precise assessment of the predictive uncertainty. We use data of one of the largest prospective AD EEG trials ever conducted to demonstrate the potential of Bayesian deep learning in this domain, while comparing two distinct Bayesian neural network approaches, i.e., Monte Carlo dropout and Hamiltonian Monte Carlo.

Local Probabilistic Model for Bayesian Classification: a Generalized Local Classification Model Machine Learning

In Bayesian classification, it is important to establish a probabilistic model for each class for likelihood estimation. Most of the previous methods modeled the probability distribution in the whole sample space. However, real-world problems are usually too complex to model in the whole sample space; some fundamental assumptions are required to simplify the global model, for example, the class conditional independence assumption for naive Bayesian classification. In this paper, with the insight that the distribution in a local sample space should be simpler than that in the whole sample space, a local probabilistic model established for a local region is expected much simpler and can relax the fundamental assumptions that may not be true in the whole sample space. Based on these advantages we propose establishing local probabilistic models for Bayesian classification. In addition, a Bayesian classifier adopting a local probabilistic model can even be viewed as a generalized local classification model; by tuning the size of the local region and the corresponding local model assumption, a fitting model can be established for a particular classification problem. The experimental results on several real-world datasets demonstrate the effectiveness of local probabilistic models for Bayesian classification.

Bayesian Spectral Deconvolution Based on Poisson Distribution: Bayesian Measurement and Virtual Measurement Analytics (VMA) Machine Learning

In this paper, we propose a new method of Bayesian measurement for spectral deconvolution, which regresses spectral data into the sum of unimodal basis function such as Gaussian or Lorentzian functions. Bayesian measurement is a framework for considering not only the target physical model but also the measurement model as a probabilistic model, and enables us to estimate the parameter of a physical model with its confidence interval through a Bayesian posterior distribution given a measurement data set. The measurement with Poisson noise is one of the most effective system to apply our proposed method. Since the measurement time is strongly related to the signal-to-noise ratio for the Poisson noise model, Bayesian measurement with Poisson noise model enables us to clarify the relationship between the measurement time and the limit of estimation. In this study, we establish the probabilistic model with Poisson noise for spectral deconvolution. Bayesian measurement enables us to perform virtual and computer simulation for a certain measurement through the established probabilistic model. This property is called "Virtual Measurement Analytics(VMA)" in this paper. We also show that the relationship between the measurement time and the limit of estimation can be extracted by using the proposed method in a simulation of synthetic data and real data for XPS measurement of MoS$_2$.

Efficient learning of smooth probability functions from Bernoulli tests with guarantees Machine Learning

We study the fundamental problem of learning an unknown, smooth probability function via point-wise Bernoulli tests. We provide the first scalable algorithm for efficiently solving this problem with rigorous guarantees. In particular, we prove the convergence rate of our posterior update rule to the true probability function in L2-norm. Moreover, we allow the Bernoulli tests to depend on contextual features, and provide a modified inference engine with provable guarantees for this novel setting. Numerical results show that the empirical convergence rates match the theory, and illustrate the superiority of our approach in handling contextual features over the state-of-the-art.