Bayesian Inference
Poisson-Randomized Gamma Dynamical Systems
Schein, Aaron, Linderman, Scott W., Zhou, Mingyuan, Blei, David M., Wallach, Hanna
This paper presents the Poisson-randomized gamma dynamical system (PRGDS), a model for sequentially observed count tensors that encodes a strong inductive bias toward sparsity and burstiness. The PRGDS is based on a new motif in Bayesian latent variable modeling, an alternating chain of discrete Poisson and continuous gamma latent states that is analytically convenient and computationally tractable. This motif yields closed-form complete conditionals for all variables by way of the Bessel distribution and a novel discrete distribution that we call the shifted confluent hypergeometric distribution. We draw connections to closely related models and compare the PRGDS to these models in studies of real-world count data sets of text, international events, and neural spike trains. We find that a sparse variant of the PRGDS, which allows the continuous gamma latent states to take values of exactly zero, often obtains better predictive performance than other models and is uniquely capable of inferring latent structures that are highly localized in time.
Ensemble Quantile Classifier
Both the median-based classifier and the quantile-based classifier are useful for discriminating high-dimensional data with heavy-tailed or skewed inputs. But these methods are restricted as they assign equal weight to each variable in an unregularized way. The ensemble quantile classifier is a more flexible regularized classifier that provides better performance with high-dimensional data, asymmetric data or when there are many irrelevant extraneous inputs. The improved performance is demonstrated by a simulation study as well as an application to text categorization. It is proven that the estimated parameters of the ensemble quantile classifier consistently estimate the minimal population loss under suitable general model assumptions. It is also shown that the ensemble quantile classifier is Bayes optimal under suitable assumptions with asymmetric Laplace distribution inputs.
Approximate Bayesian Computation with the Sliced-Wasserstein Distance
Nadjahi, Kimia, De Bortoli, Valentin, Durmus, Alain, Badeau, Roland, Şimşekli, Umut
Approximate Bayesian Computation (ABC) is a popular method for approximate inference in generative models with intractable but easy-to-sample likelihood. It constructs an approximate posterior distribution by finding parameters for which the simulated data are close to the observations in terms of summary statistics. These statistics are defined beforehand and might induce a loss of information, which has been shown to deteriorate the quality of the approximation. To overcome this problem, Wasserstein-ABC has been recently proposed, and compares the datasets via the Wasserstein distance between their empirical distributions, but does not scale well to the dimension or the number of samples. We propose a new ABC technique, called Sliced-Wasserstein ABC and based on the Sliced-Wasserstein distance, which has better computational and statistical properties. We derive two theoretical results showing the asymptotical consistency of our approach, and we illustrate its advantages on synthetic data and an image denoising task.
The Study of Machine Learning Models in Predicting the Intention of Adolescents to Smoke Cigarettes
Nam, Seung Joon, Kim, Han Min, Kang, Thomas, Park, Cheol Young
The use of electronic cigarette (e-cigarette) is increasing among adolescents. This is problematic since consuming nicotine at an early age can cause harmful effects in developing teenager's brain and health. Additionally, the use of e-cigarette has a possibility of leading to the use of cigarettes, which is more severe. There were many researches about e-cigarette and cigarette that mostly focused on finding and analyzing causes of smoking using conventional statistics. However, there is a lack of research on developing prediction models, which is more applicable to anti-smoking campaign, about e-cigarette and cigarette. In this paper, we research the prediction models that can be used to predict an individual e-cigarette user's (including non-e-cigarette users) intention to smoke cigarettes, so that one can be early informed about the risk of going down the path of smoking cigarettes. To construct the prediction models, five machine learning (ML) algorithms are exploited and tested for their accuracy in predicting the intention to smoke cigarettes among never smokers using data from the 2018 National Youth Tobacco Survey (NYTS). In our investigation, the Gradient Boosting Classifier, one of the prediction models, shows the highest accuracy out of all the other models. Also, with the best prediction model, we made a public website that enables users to input information to predict their intentions of smoking cigarettes.
Large-Scale Characterization and Segmentation of Internet Path Delays with Infinite HMMs
Mouchet, Maxime, Vaton, Sandrine, Chonavel, Thierry, Aben, Emile, Hertog, Jasper den
Round-Trip Times are one of the most commonly collected performance metrics in computer networks. Measurement platforms such as RIPE Atlas provide researchers and network operators with an unprecedented amount of historical Internet delay measurements. It would be very useful to automate the processing of these measurements (statistical characterization of paths performance, change detection, recognition of recurring patterns, etc.). Humans are pretty good at finding patterns in network measurements but it can be difficult to automate this to enable many time series being processed at the same time. In this article we introduce a new model, the HDP-HMM or infinite hidden Markov model, whose performance in trace segmentation is very close to human cognition. This is obtained at the cost of a greater complexity and the ambition of this article is to make the theory accessible to network monitoring and management researchers. We demonstrate that this model provides very accurate results on a labeled dataset and on RIPE Atlas and CAIDA MANIC data. This method has been implemented in Atlas and we introduce the publicly accessible Web API.
Beyond the proton drip line: Bayesian analysis of proton-emitting nuclei
Neufcourt, Léo, Cao, Yuchen, Giuliani, Samuel, Nazarewicz, Witold, Olsen, Erik, Tarasov, Oleg B.
The limits of the nuclear landscape are determined by nuclear binding energies. Beyond the proton drip lines, where the separation energy becomes negative, there is not enough binding energy to prevent protons from escaping the nucleus. Predicting properties of unstable nuclear states in the vast territory of proton emitters poses an appreciable challenge for nuclear theory as it often involves far extrapolations. In addition, significant discrepancies between nuclear models in the proton-rich territory call for quantified predictions. With the help of Bayesian methodology, we mix a family of nuclear mass models corrected with statistical emulators trained on the experimental mass measurements, in the proton-rich region of the nuclear chart. Separation energies were computed within nuclear density functional theory using several Skyrme and Gogny energy density functionals. We also considered mass predictions based on two models used in astrophysical studies. Quantified predictions were obtained for each model using Bayesian Gaussian processes trained on separation-energy residuals and combined via Bayesian model averaging. We obtained a good agreement between averaged predictions of statistically corrected models and experiment. In particular, we quantified model results for one- and two-proton separation energies and derived probabilities of proton emission. This information enabled us to produce a quantified landscape of proton-rich nuclei. The most promising candidates for two-proton decay studies have been identified. The methodology used in this work has broad applications to model-based extrapolations of various nuclear observables. It also provides a reliable uncertainty quantification of theoretical predictions.
A Gentle Introduction to Logistic Regression With Maximum Likelihood Estimation
Logistic regression is a model for binary classification predictive modeling. The parameters of a logistic regression model can be estimated by the probabilistic framework called maximum likelihood estimation. Under this framework, a probability distribution for the target variable (class label) must be assumed and then a likelihood function defined that calculates the probability of observing the outcome given the input data and the model. This function can then be optimized to find the set of parameters that results in the largest sum likelihood over the training dataset. The maximum likelihood approach to fitting a logistic regression model both aids in better understanding the form of the logistic regression model and provides a template that can be used for fitting classification models more generally.
An Introduction to the Powerful Bayes' Theorem for Data Science Professionals
Probability is at the very core of a lot of data science algorithms. In fact, the solutions to so many data science problems are probabilistic in nature – hence I always advice focusing on learning statistics and probability before jumping into the algorithms. But I've seen a lot of aspiring data scientists shunning statistics, especially Bayesian statistics. It remains incomprehensible to a lot of analysts and data scientists. I'm sure a lot of you are nodding your head at this! Bayes' Theorem, a major aspect of Bayesian Statistics, was created by Thomas Bayes, a monk who lived during the eighteenth century. The very fact that we're still learning about it shows how influential his work has been across centuries!
Conditional Expectation Propagation
Expectation propagation (EP) is a powerful approximate inference algorithm. However, a critical barrier in applying EP is that the moment matching in message updates can be intractable. Handcrafting approximations is usually tricky, and lacks generalizability. Importance sampling is very expensive. While Laplace propagation provides a good solution, it has to run numerical optimizations to find Laplace approximations in every update, which is still quite inefficient. To overcome these practical barriers, we propose conditional expectation propagation (CEP) that performs conditional moment matching given the variables outside each message, and then takes expectation w.r.t the approximate posterior of these variables. The conditional moments are often analytical and much easier to derive. In the most general case, we can use (fully) factorized messages to represent the conditional moments by quadrature formulas. We then compute the expectation of the conditional moments via Taylor approximations when necessary. In this way, our algorithm can always conduct efficient, analytical fixed point iterations. Experiments on several popular models for which standard EP is available or unavailable demonstrate the advantages of CEP in both inference quality and computational efficiency.
Prior specification via prior predictive matching: Poisson matrix factorization and beyond
da Silva, Eliezer de Souza, Kuśmierczyk, Tomasz, Hartmann, Marcelo, Klami, Arto
Hyperparameter optimization for machine learning models is typically carried out by some sort of cross-validation procedure or global optimization, both of which require running the learning algorithm numerous times. We show that for Bayesian hierarchical models there is an appealing alternative that allows selecting good hyperparameters without learning the model parameters during the process at all, facilitated by the prior predictive distribution that marginalizes out the model parameters. We propose an approach that matches suitable statistics of the prior predictive distribution with ones provided by an expert and apply the general concept for matrix factorization models. For some Poisson matrix factorization models we can analytically obtain exact hyperparameters, including the number of factors, and for more complex models we propose a model-independent optimization procedure.