AITopics | Bayesian Inference

Collaborating Authors

Bayesian Inference

Bayes' Theorem allows a program to infer the probabilities of likely causes from the probabilities of their effects, when what it is given are the probabilities of effects, given the causes.

News Overviews Instructional Materials AI-Alerts Classics

Scaling active inference

Tschantz, Alexander, Baltieri, Manuel, Seth, Anil. K., Buckley, Christopher L.

arXiv.org Artificial IntelligenceNov-24-2019

In reinforcement learning (RL), agents often operate in partially observed and uncertain environments. Model-based RL suggests that this is best achieved by learning and exploiting a probabilistic model of the world. 'Active inference' is an emerging normative framework in cognitive and computational neuroscience that offers a unifying account of how biological agents achieve this. On this framework, inference, learning and action emerge from a single imperative to maximize the Bayesian evidence for a niched model of the world. However, implementations of this process have thus far been restricted to low-dimensional and idealized situations. Here, we present a working implementation of active inference that applies to high-dimensional tasks, with proof-of-principle results demonstrating efficient exploration and an order of magnitude increase in sample efficiency over strong model-free baselines. Our results demonstrate the feasibility of applying active inference at scale and highlight the operational homologies between active inference and current model-based approaches to RL.

active inference, agent, inference, (12 more...)

arXiv.org Artificial Intelligence

1911.10601

Country:

Europe > United Kingdom > England > East Sussex > Brighton (0.04)
Asia > Japan > Honshū > Kantō > Saitama Prefecture > Saitama (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.87)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
(2 more...)

Add feedback

Differentially Private Federated Variational Inference

Sharma, Mrinank, Hutchinson, Michael, Swaroop, Siddharth, Honkela, Antti, Turner, Richard E.

arXiv.org Artificial IntelligenceNov-24-2019

In many real-world applications of machine learning, data are distributed across many clients and cannot leave the devices they are stored on. Furthermore, each client's data, computational resources and communication constraints may be very different. This setting is known as federated learning, in which privacy is a key concern. Differential privacy is commonly used to provide mathematical privacy guarantees. This work, to the best of our knowledge, is the first to consider federated, differentially private, Bayesian learning. We build on Partitioned Variational Inference (PVI) which was recently developed to support approximate Bayesian inference in the federated setting. We modify the client-side optimisation of PVI to provide an (${\epsilon}$, ${\delta}$)-DP guarantee. We show that it is possible to learn moderately private logistic regression models in the federated setting that achieve similar performance to models trained non-privately on centralised data.

algorithm, ariational inference, pvi, (16 more...)

arXiv.org Artificial Intelligence

1911.10563

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > Canada (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

Add feedback

Privacy-preserving parametric inference: a case for robust statistics

Avella-Medina, Marco

arXiv.org Machine LearningNov-22-2019

Differential privacy is a cryptographically-motivated approach to privacy that has become a very active field of research over the last decade in theoretical computer science and machine learning. In this paradigm one assumes there is a trusted curator who holds the data of individuals in a database and the goal of privacy is to simultaneously protect individual data while allowing the release of global characteristics of the database. In this setting we introduce a general framework for parametric inference with differential privacy guarantees. We first obtain differentially private estimators based on bounded influence M-estimators by leveraging their gross-error sensitivity in the calibration of a noise term added to them in order to ensure privacy. We then show how a similar construction can also be applied to construct differentially private test statistics analogous to the Wald, score and likelihood ratio tests. We provide statistical guarantees for all our proposals via an asymptotic analysis. An interesting consequence of our results is to further clarify the connection between differential privacy and robust statistics. In particular, we demonstrate that differential privacy is a weaker stability requirement than infinitesimal robustness, and show that robust M-estimators can be easily randomized in order to guarantee both differential privacy and robustness towards the presence of contaminated data. We illustrate our results both on simulated and real data.

estimator, influence function, sensitivity, (17 more...)

arXiv.org Machine Learning

1911.10167

Country:

North America > United States > New York > New York County > New York City (0.04)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
North America > United States > Massachusetts (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.82)
(2 more...)

Add feedback

Low-variance Black-box Gradient Estimates for the Plackett-Luce Distribution

Gadetsky, Artyom, Struminsky, Kirill, Robinson, Christopher, Quadrianto, Novi, Vetrov, Dmitry

arXiv.org Machine LearningNov-22-2019

Learning models with discrete latent variables using stochastic gradient descent remains a challenge due to the high variance of gradient estimates. Modern variance reduction techniques mostly consider categorical distributions and have limited applicability when the number of possible outcomes becomes large. In this work, we consider models with latent permutations and propose control variates for the Plackett-Luce distribution. In particular, the control variates allow us to optimize black-box functions over permutations using stochastic gradient descent. To illustrate the approach, we consider a variety of causal structure learning tasks for continuous and discrete data. We show that our method outperforms competitive relaxation-based optimization methods and is also applicable to non-differentiable score functions.

estimator, optimization, permutation, (13 more...)

arXiv.org Machine Learning

1911.10036

Country: Europe > Spain > Canary Islands (0.04)

Genre: Research Report (0.82)

Industry: Transportation > Air (0.61)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)

Add feedback

Investigating bankruptcy prediction models in the presence of extreme class imbalance and multiple stages of economy

Islam, Sheikh Rabiul, Eberle, William, Ghafoor, Sheikh K., Bundy, Sid C., Talbert, Douglas A., Siraj, Ambareen

arXiv.org Machine LearningNov-22-2019

In the area of credit risk analytics, current Bankruptcy Prediction Models (BPMs) struggle with (a) the availability of comprehensive and real-world data sets and (b) the presence of extreme class imbalance in the data (i.e., very few samples for the minority class) that degrades the performance of the prediction model. Moreover, little research has compared the relative performance of well-known BPM's on public datasets addressing the class imbalance problem. In this work, we apply eight classes of well-known BPMs, as suggested by a review of decades of literature, on a new public dataset named Freddie Mac Single-Family Loan-Level Dataset with resampling (i.e., adding synthetic minority samples) of the minority class to tackle class imbalance. Additionally, we apply some recent AI techniques (e.g., tree-based ensemble techniques) that demonstrate potentially better results on models trained with resampled data. In addition, from the analysis of 19 years (1999-2017) of data, we discover that models behave differently when presented with sudden changes in the economy (e.g., a global financial crisis) resulting in abrupt fluctuations in the national default rate. In summary, this study should aid practitioners/researchers in determining the appropriate model with respect to data that contains a class imbalance and various economic stages.

algorithm, class imbalance, dataset, (10 more...)

arXiv.org Machine Learning

1911.09858

Country:

Europe (0.14)
North America > United States > Tennessee > Putnam County > Cookeville (0.04)
North America > United States > Michigan (0.04)
North America > United States > Kansas (0.04)

Genre:

Overview (0.93)
Research Report > Experimental Study (0.46)

Industry:

Banking & Finance > Real Estate (1.00)
Banking & Finance > Economy (1.00)
Banking & Finance > Credit (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

DBSN: Measuring Uncertainty through Bayesian Learning of Deep Neural Network Structures

Deng, Zhijie, Luo, Yucen, Zhu, Jun, Zhang, Bo

arXiv.org Machine LearningNov-21-2019

Bayesian neural networks (BNNs) introduce uncertainty estimation to deep networks by performing Bayesian inference on network weights. However, such models bring the challenges of inference, and further BNNs with weight uncertainty rarely achieve superior performance to standard models. In this paper, we investigate a new line of Bayesian deep learning by performing Bayesian reasoning on the structure of deep neural networks. Drawing inspiration from the neural architecture search, we define the network structure as gating weights on the redundant operations between computational nodes, and apply stochastic variational inference techniques to learn the structure distributions of networks. Empirically, the proposed method substantially surpasses the advanced deep neural networks across a range of classification and segmentation tasks. More importantly, our approach also preserves benefits of Bayesian principles, producing improved uncertainty estimation than the strong baselines including MC dropout and variational BNNs algorithms (e.g. noisy EK-FAC).

dbsn, network structure, neural network, (15 more...)

arXiv.org Machine Learning

1911.09804

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.64)

Add feedback

Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees

Zhang, Ruqi, De Sa, Christopher

arXiv.org Machine LearningNov-21-2019

Gibbs sampling is a Markov chain Monte Carlo method that is often used for learning and inference on graphical models. Minibatching, in which a small random subset of the graph is used at each iteration, can help make Gibbs sampling scale to large graphical models by reducing its computational cost. In this paper, we propose a new auxiliary-variable minibatched Gibbs sampling method, {\it Poisson-minibatching Gibbs}, which both produces unbiased samples and has a theoretical guarantee on its convergence rate. In comparison to previous minibatched Gibbs algorithms, Poisson-minibatching Gibbs supports fast sampling from continuous state spaces and avoids the need for a Metropolis-Hastings correction on discrete state spaces. We demonstrate the effectiveness of our method on multiple applications and in comparison with both plain Gibbs and previous minibatched methods.

exp, gibbs, log null 1, (15 more...)

arXiv.org Machine Learning

1911.09771

Country:

North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Parallelising MCMC via Random Forests

Changye, Wu, Robert, Christian P.

arXiv.org Machine LearningNov-21-2019

Markov chain Monte Carlo (MCMC) algorithm, a generic sampling method, is ubiquitous in modern statistics, especially in Bayesian fields. MCMC algorithms require only the evaluation of the target pointwise, up to a multiple constant, in order to sample from it. In Bayesian analysis, the object of main interest is the posterior, which is not in closed form in general, and MCMC has become a standard tool in this domain. However, MCMC is difficult to scale and its applications are limited when the observation size is very large, for it needs to sweep over the entire observations set in order to evaluate the likelihood function at each iteration. Recently, many methods have been proposed to better scale MCMC algorithms for big data sets and these can be roughly classified into two groups Bardenet et al. (2017): divide-and-conquer methods and subsampling-based methods. For divide-and-conquer methods, one splits the whole data set into subsets, runs MCMC over each subset to generate samples of parameters and combine these to produce an approximation of the true posterior. Depending on how MCMC is handled over the subsets, these methods can be further classified into two sub-categories.

posterior, subposterior, true posterior, (15 more...)

arXiv.org Machine Learning

1911.09698

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Generalizing Information to the Evolution of Rational Belief

Duersch, Jed A., Catanach, Thomas A.

arXiv.org Machine LearningNov-21-2019

Information theory provides a mathematical foundation to measure uncertainty in belief. Belief is represented by a probability distribution that captures our understanding of an outcome's plausibility. Information measures based on Shannon's concept of entropy include realization information, Kullback-Leibler divergence, Lindley's information in experiment, cross entropy, and mutual information. We derive a general theory of information from first principles that accounts for evolving belief and recovers all of these measures. Rather than simply gauging uncertainty, information is understood in this theory to measure change in belief. We may then regard entropy as the information we expect to gain upon realization of a discrete latent random variable. This theory of information is compatible with the Bayesian paradigm in which rational belief is updated as evidence becomes available. Furthermore, this theory admits novel measures of information with well-defined properties, which we explore in both analysis and experiment. This view of information illuminates the study of machine learning by allowing us to quantify information captured by a predictive model and distinguish it from residual information contained in training data. We gain related insights regarding feature selection, anomaly detection, and novel Bayesian approaches.

experiment, inference, information, (15 more...)

arXiv.org Machine Learning

1911.09559

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California > Alameda County > Livermore (0.04)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

On Universal Features for High-Dimensional Learning and Inference

Huang, Shao-Lun, Makur, Anuran, Wornell, Gregory W., Zheng, Lizhong

arXiv.org Machine LearningNov-20-2019

We consider the problem of identifying universal low-dimensional features from high-dimensional data for inference tasks in settings involving learning. For such problems, we introduce natural notions of universality and we show a local equivalence among them. Our analysis is naturally expressed via information geometry, and represents a conceptually and computationally useful analysis. The development reveals the complementary roles of the singular value decomposition, Hirschfeld-Gebelein-R\'enyi maximal correlation, the canonical correlation and principle component analyses of Hotelling and Pearson, Tishby's information bottleneck, Wyner's common information, Ky Fan $k$-norms, and Brieman and Friedman's alternating conditional expectations algorithm. We further illustrate how this framework facilitates understanding and optimizing aspects of learning systems, including multinomial logistic (softmax) regression and the associated neural network architecture, matrix factorization methods for collaborative filtering and other applications, rank-constrained multivariate linear regression, and forms of semi-supervised learning.

decomposition, denote, matrix, (16 more...)

arXiv.org Machine Learning

1911.09105

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > United Kingdom > England > Greater London > London (0.13)
(27 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Media > Film (0.45)
Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(2 more...)

Add feedback