Bayesian Learning: Overviews

A Framework for Testing Identifiability of Bayesian Models of Perception

Neural Information Processing Systems

Bayesian observer models are very effective in describing human performance in perceptual tasks, so much so that they are trusted to faithfully recover hidden mental representations of priors, likelihoods, or loss functions from the data. However, the intrinsic degeneracy of the Bayesian framework, as multiple combinations of elements can yield empirically indistinguishable results, prompts the question of model identifiability. We propose a novel framework for a systematic testing of the identifiability of a significant class of Bayesian observer models, with practical applications for improving experimental design. We examine the theoretical identifiability of the inferred internal representations in two case studies. First, we show which experimental designs work better to remove the underlying degeneracy in a time interval estimation task.

Machine Learning for Recommender Systems - A Primer


The growth of ecommerce in the recent past can only be described as explosive and sweeping across the planet. According to a 2016 study, half of all dollars spent online in America belong to Amazon. And consider this, Recommendation Engines alone drive 35% of that revenue. But it is not ecommerce alone that's reaping the huge benefits that recommendation engines have to offer. Direct to device streaming services such as Netflix, Spotify among others, analyze user behavior almost to a micro moment level, then gather data surrounding similar users who are likely to buy the same items based on their browsing history, and provide that much needed nudge to move on to the next purchase on the platform.

Causality-based Feature Selection: Methods and Evaluations Artificial Intelligence

Feature selection is a crucial preprocessing step in data analytics and machine learning. Classical feature selection algorithms select features based on the correlations between predictive features and the class variable and do not attempt to capture causal relationships between them. It has been shown that the knowledge about the causal relationships between features and the class variable has potential benefits for building interpretable and robust prediction models, since causal relationships imply the underlying mechanism of a system. Consequently, causality-based feature selection has gradually attracted greater attentions and many algorithms have been proposed. In this paper, we present a comprehensive review of recent advances in causality-based feature selection. To facilitate the development of new algorithms in the research area and make it easy for the comparisons between new methods and existing ones, we develop the first open-source package, called CausalFS, which consists of most of the representative causality-based feature selection algorithms (available at Using CausalFS, we conduct extensive experiments to compare the representative algorithms with both synthetic and real-world data sets. Finally, we discuss some challenging problems to be tackled in future causality-based feature selection research.

Identifying Hidden Buyers in Darknet Markets via Dirichlet Hawkes Process Machine Learning

The darknet markets are notorious black markets in cyberspace, which involve selling or brokering drugs, weapons, stolen credit cards, and other illicit goods. To combat illicit transactions in the cyberspace, it is important to analyze the behaviors of participants in darknet markets. Currently, many studies focus on studying the behavior of vendors. However, there is no much work on analyzing buyers. The key challenge is that the buyers are anonymized in darknet markets. For most of the darknet markets, We only observe the first and last digits of a buyer's ID, such as ``a**b''. To tackle this challenge, we propose a hidden buyer identification model, called UNMIX, which can group the transactions from one hidden buyer into one cluster given a transaction sequence from an anonymized ID. UNMIX is able to model the temporal dynamics information as well as the product, comment, and vendor information associated with each transaction. As a result, the transactions with similar patterns in terms of time and content group together as the subsequence from one hidden buyer. Experiments on the data collected from three real-world darknet markets demonstrate the effectiveness of our approach measured by various clustering metrics. Case studies on real transaction sequences explicitly show that our approach can group transactions with similar patterns into the same clusters.

Protecting from Malware Obfuscation Attacks through Adversarial Risk Analysis Machine Learning

Standard algorithms in detection systems perform insufficiently when dealing with malware passed through obfuscation tools. We illustrate this studying in detail an open source metamorphic software, making use of a hybrid framework to obtain the relevant features from binaries. We then provide an improved alternative solution based on adversarial risk analysis which we illustrate describe with an example. KEYWORDS: Adversarial Risk Analysis, Malware Obfuscation, Cybersecurity 1 INTRODUCTION The digital era is bringing along new global threats among which cybersecurity related ones emerge as truly worrisome, see for example the evolution of the Global Risks Map from the World Economic Forum (2017, 2018, 2019). Indeed, the operation of critical cyber infrastructures relies on components which could be cyber attacked, both incidentally and intentionally, suffering major performance degradation, Rao et al. (2016).

Parameter elimination in particle Gibbs sampling Machine Learning

Bayesian inference in state-space models is challenging due to high-dimensional state trajectories. A viable approach is particle Markov chain Monte Carlo, combining MCMC and sequential Monte Carlo to form "exact approximations" to otherwise intractable MCMC methods. The performance of the approximation is limited to that of the exact method. We focus on particle Gibbs and particle Gibbs with ancestor sampling, improving their performance beyond that of the underlying Gibbs sampler (which they approximate) by marginalizing out one or more parameters. This is possible when the parameter prior is conjugate to the complete data likelihood. Marginalization yields a non-Markovian model for inference, but we show that, in contrast to the general case, this method still scales linearly in time. While marginalization can be cumbersome to implement, recent advances in probabilistic programming have enabled its automation. We demonstrate how the marginalized methods are viable as efficient inference backends in probabilistic programming, and demonstrate with examples in ecology and epidemiology.

Learning from both experts and data Machine Learning

In this work we study the problem of inferring a discrete probability distribution using both expert knowledge and empirical data. This is an important issue for many applications where the scarcity of data prevents a purely empirical approach. In this context, it is common to rely first on an initial domain knowledge a priori before proceeding to an online data acquisition. We are particularly interested in the intermediate regime where we do not have enough data to do without the initial expert a priori of the experts, but enough to correct it if necessary. We present here a novel way to tackle this issue with a method providing an objective way to choose the weight to be given to experts compared to data. We show, both empirically and theoretically, that our proposed estimator is always more efficient than the best of the two models (expert or data) within a constant.

The Empirical Derivation of the Bayesian Formula Open Data Science Conference


Editor's note: James is a speaker for ODSC London this November! Be sure to check out his talk, "The How, Why, and When of Replacing Engineering Work with Compute Power" there. Deep learning has been made practical through modern computing power, but it is not the only technique benefiting from this large increase in power. Bayesian inference is up and coming technique whose recent progress is powered by the increase in computing power. We can explain the mathematical expression of Bayes formula using an example similar to the great Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference to a financial context and using mathematical concepts intuitively arise from code.

Distributed Bayesian Computation for Model Choice Machine Learning

We propose a general method for distributed Bayesian model choice, where each worker has access only to non-overlapping subsets of the data. Our approach approximates the model evidence for the full data set through Monte Carlo sampling from the posterior on every subset generating a model evidence per subset. The model evidences per worker are then consistently combined using a novel approach which corrects for the splitting using summary statistics of the generated samples. This divide-and-conquer approach allows Bayesian model choice in the large data setting, exploiting all available information but limiting communication between workers. Our work thereby complements the work on consensus Monte Carlo (Scott et al., 2016) by explicitly enabling model choice. In addition, we show how the suggested approach can be extended to model choice within a reversible jump setting that explores multiple models within one run.

An Optimal Transport Formulation of the Ensemble Kalman Filter Machine Learning

Controlled interacting particle systems such as the ensemble Kalman filter (EnKF) and the feedback particle filter (FPF) are numerical algorithms to approximate the solution of the nonlinear filtering problem in continuous time. The distinguishing feature of these algorithms is that the Bayesian update step is implemented using a feedback control law. It has been noted in the literature that the control law is not unique. This is the main problem addressed in this paper. To obtain a unique control law, the filtering problem is formulated here as an optimal transportation problem. An explicit formula for the (mean-field type) optimal control law is derived in the linear Gaussian setting. Comparisons are made with the control laws for different types of EnKF algorithms described in the literature. Via empirical approximation of the mean-field control law, a finite-$N$ controlled interacting particle algorithm is obtained. For this algorithm, the equations for empirical mean and covariance are derived and shown to be identical to the Kalman filter. This allows strong conclusions on convergence and error properties based on the classical filter stability theory for the Kalman filter. It is shown that, under certain technical conditions, the mean squared error (m.s.e.) converges to zero even with a finite number of particles. A detailed propagation of chaos analysis is carried out for the finite-$N$ algorithm. The analysis is used to prove weak convergence of the empirical distribution as $N\rightarrow\infty$. For a certain simplified filtering problem, analytical comparison of the m.s.e. with the importance sampling-based algorithms is described. The analysis helps explain the favorable scaling properties of the control-based algorithms reported in several numerical studies in recent literature.