AITopics | Bayesian Learning

Collaborating Authors

Bayesian Learning

A Bayesian network, Bayes network, belief network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Deep Diffeomorphic Normalizing Flows

Salman, Hadi, Yadollahpour, Payman, Fletcher, Tom, Batmanghelich, Kayhan

arXiv.org Machine LearningOct-7-2018

The Normalizing Flow (NF) models a general probability density by estimating an invertible transformation applied on samples drawn from a known distribution. We introduce a new type of NF, called Deep Diffeomorphic Normalizing Flow (DDNF). A diffeomorphic flow is an invertible function where both the function and its inverse are smooth. We construct the flow using an ordinary differential equation (ODE) governed by a time-varying smooth vector field. We use a neural network to parametrize the smooth vector field and a recursive neural network (RNN) for approximating the solution of the ODE. Each cell in the RNN is a residual network implementing one Euler integration step. The architecture of our flow enables efficient likelihood evaluation, straightforward flow inversion, and results in highly flexible density estimation. An end-to-end trained DDNF achieves competitive results with state-of-the-art methods on a suite of density estimation and variational inference tasks. Finally, our method brings concepts from Riemannian geometry that, we believe, can open a new research direction for neural density estimation.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

1810.03256

Genre: Research Report (0.70)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Deep convolutional Gaussian processes

Blomqvist, Kenneth, Kaski, Samuel, Heinonen, Markus

arXiv.org Machine LearningOct-6-2018

We propose deep convolutional Gaussian processes, a deep Gaussian process architecture with convolutional structure. The model is a principled Bayesian framework for detecting hierarchical combinations of local features for image classification. We demonstrate greatly improved image classification performance compared to current Gaussian process approaches on the MNIST and CIFAR-10 datasets. In particular, we improve CIFAR-10 accuracy by over 10 percentage points.

artificial intelligence, machine learning, modeling & simulation, (13 more...)

arXiv.org Machine Learning

1810.03052

Genre: Research Report (0.40)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

Bayes-CPACE: PAC Optimal Exploration in Continuous Space Bayes-Adaptive Markov Decision Processes

Lee, Gilwoo, Choudhury, Sanjiban, Hou, Brian, Srinivasa, Siddhartha S.

arXiv.org Machine LearningOct-6-2018

We present the first PAC optimal algorithm for Bayes-Adaptive Markov Decision Processes (BAMDPs) in continuous state and action spaces, to the best of our knowledge. The BAMDP framework elegantly addresses model uncertainty by incorporating Bayesian belief updates into long-term expected return. However, computing an exact optimal Bayesian policy is intractable. Our key insight is to compute a near-optimal value function by covering the continuous state-belief-action space with a finite set of representative samples and exploiting the Lipschitz continuity of the value function. We prove the near-optimality of our algorithm and analyze a number of schemes that boost the algorithm's efficiency. Finally, we empirically validate our approach on a number of discrete and continuous BAMDPs and show that the learned policy has consistently competitive performance against baseline approaches.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1810.03048

Country: North America > United States (0.93)

Genre: Research Report (0.40)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Text Classification of the Precursory Accelerating Seismicity Corpus: Inference on some Theoretical Trends in Earthquake Predictability Research from 1988 to 2018

Mignan, Arnaud

arXiv.org Machine LearningOct-5-2018

Text analytics based on supervised machine learning classifiers has shown great promise in a multitude of domains, but has yet to be applied to Seismology. We test various standard models (Naive Bayes, k-Nearest Neighbors, Support Vector Machines, and Random Forests) on a seismological corpus of 100 articles related to the topic of precursory accelerating seismicity, spanning from 1988 to 2010. This corpus was labelled in Mignan (2011) with the precursor whether explained by critical processes (i.e., cascade triggering) or by other processes (such as signature of main fault loading). We investigate rather the classification process can be automatized to help analyze larger corpora in order to better understand trends in earthquake predictability research. We find that the Naive Bayes model performs best, in agreement with the machine learning literature for the case of small datasets, with cross-validation accuracies of 86% for binary classification. For a refined multiclass classification ('non-critical process' < 'agnostic' < 'critical process assumed' < 'critical process demonstrated'), we obtain up to 78% accuracy. Prediction on a dozen of articles published since 2011 shows however a weak generalization with a F1-score of 60%, only slightly better than a random classifier, which can be explained by a change of authorship and use of different terminologies. Yet, the model shows F1-scores greater than 80% for the two multiclass extremes ('non-critical process' versus 'critical process demonstrated') while it falls to random classifier results (around 25%) for papers labelled 'agnostic' or 'critical process assumed'. Those results are encouraging in view of the small size of the corpus and of the high degree of abstraction of the labelling. Domain knowledge engineering remains essential but can be made transparent by an investigation of Naive Bayes keyword posterior probabilities.

earthquake, survey article, upstream oil & gas, (21 more...)

arXiv.org Machine Learning

1810.0348

Country:

North America > United States > California (0.47)
Europe > Italy (0.46)
Asia > Middle East > Republic of Türkiye (0.46)
(9 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

IMMIGRATE: A Margin-based Feature Selection Method with Interaction Terms

Zhao, Ruzhang, Hong, Pengyu, Liu, Jun S.

arXiv.org Machine LearningOct-5-2018

By balancing margin-quantity maximization and margin-quality maximization, the proposed IMMIGRATE algorithm considers both local and global information when using margin-based frameworks. We here derive a new mathematical interpretation of margin-based cost function by using the quadratic form distance (QFD) and applying both the large-margin and max-min entropy principles. We also design a new principle for classifying new samples and propose a Bayesian framework to iteratively minimize the cost function. We demonstrate the power of our new method by comparing it with 16 widely used classifiers (e.g. Support Vector Machine, k-nearest neighbors, RELIEF, etc.) including some classifiers that are capable of identifying interaction terms (e.g. SODA, hierNet, etc.) on synthetic dataset, five gene expression datasets, and twenty UCI machine learning datasets. Our method is able to outperform other methods in most cases.

algorithm, artificial intelligence, machine learning, (12 more...)

arXiv.org Machine Learning

1810.02658

Country: North America > United States (0.68)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.70)
Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.53)

Add feedback

On Theory for BART

Rockova, Veronika, Saha, Enakshi

arXiv.org Machine LearningOct-5-2018

Ensemble learning is a statistical paradigm built on the premise that many weak learners can perform exceptionally well when deployed collectively. The BART method of Chipman et al. (2010) is a prominent example of Bayesian ensemble learning, where each learner is a tree. Due to its impressive performance, BART has received a lot of attention from practitioners. Despite its wide popularity, however, theoretical studies of BART have begun emerging only very recently. Laying the foundations for the theoretical analysis of Bayesian forests, Rockova and van der Pas (2017) showed optimal posterior concentration under conditionally uniform tree priors. These priors deviate from the actual priors implemented in BART. Here, we study the exact BART prior and propose a simple modification so that it also enjoys optimality properties. To this end, we dive into branching process theory. We obtain tail bounds for the distribution of total progeny under heterogeneous Galton-Watson (GW) processes exploiting their connection to random walks. We conclude with a result stating the optimal rate of posterior convergence for BART.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1810.00787

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Projective Inference in High-dimensional Problems: Prediction and Feature Selection

Piironen, Juho, Paasiniemi, Markus, Vehtari, Aki

arXiv.org Machine LearningOct-4-2018

This paper discusses predictive inference and feature selection for generalized linear models with scarce but high-dimensional data. We argue that in many cases one can benefit from a decision theoretically justified two-stage approach: first, construct a possibly non-sparse model that predicts well, and then find a minimal subset of features that characterize the predictions. The model built in the first step is referred to as the \emph{reference model} and the operation during the latter step as predictive \emph{projection}. The key characteristic of this approach is that it finds an excellent tradeoff between sparsity and predictive accuracy, and the gain comes from utilizing all available information including prior and that coming from the left out features. We review several methods that follow this principle and provide novel methodological contributions. We present a new projection technique that unifies two existing techniques and is both accurate and fast to compute. We also propose a way of evaluating the feature selection process using fast leave-one-out cross-validation that allows for easy and intuitive model size selection. Furthermore, we prove a theorem that helps to understand the conditions under which the projective approach could be beneficial. The benefits are illustrated via several simulated and real world examples.

projection, reference model, selection, (14 more...)

arXiv.org Machine Learning

doi: 10.1214/20-EJS1711

1810.02406

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.66)

Add feedback

An Overview of Clinical Applications of Artificial Intelligence

#artificialintelligenceOct-3-2018, 06:11:13 GMT

AI will change radiology, but it won't replace radiologists.

artificial intelligence, machine learning, natural language, (15 more...)

#artificialintelligence

Country:

North America > United States (1.00)
Europe (1.00)
North America > Canada > Ontario (0.46)
North America > Canada > Quebec (0.28)

Genre:

Research Report > Experimental Study (0.93)
Overview (0.93)
Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
(19 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(5 more...)

Add feedback

Inhibited Softmax for Uncertainty Estimation in Neural Networks

Możejko, Marcin, Susik, Mateusz, Karczewski, Rafał

arXiv.org Machine LearningOct-3-2018

We present a new method for uncertainty estimation and out-of-distribution detection in neural networks with softmax output. We extend softmax layer with an additional constant input. The corresponding additional output is able to represent the uncertainty of the network. The proposed method requires neither additional parameters nor multiple forward passes nor input preprocessing nor out-of-distribution datasets. We show that our method performs comparably to more computationally expensive methods and outperforms baselines on our experiments from image recognition and sentiment analysis domains. The applications of computational learning systems might cause intrusive effects if we assume that predictions are always as accurate as during the experimental phase. Examples include misclassified traffic signs (Evtimov et al., 2018) and an image tagger that classified two African Americans as gorillas (Curtis, 2015). This is often caused by overconfidence of models that has been observed in the case of deep neural networks (Guo et al., 2017). Such malfunctions can be prevented if we estimate correctly the uncertainty of the machine learning system.

experiment, inhibited softmax, neural network, (14 more...)

arXiv.org Machine Learning

1810.01861

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

A Bayesian model for sparse graphs with flexible degree distribution and overlapping community structure

Lee, Juho, James, Lancelot F., Choi, Seungjin, Caron, François

arXiv.org Machine LearningOct-3-2018

We consider a non-projective class of inhomogeneous random graph models with interpretable parameters and a number of interesting asymptotic properties. Using the results of Bollob\'as et al. [2007], we show that i) the class of models is sparse and ii) depending on the choice of the parameters, the model is either scale-free, with power-law exponent greater than 2, or with an asymptotic degree distribution which is power-law with exponential cut-off. We propose an extension of the model that can accommodate an overlapping community structure. Scalable posterior inference can be performed due to the specific choice of the link probability. We present experiments on five different real-world networks with up to 100,000 nodes and edges, showing that the model can provide a good fit to the degree distribution and recovers well the latent community structure.

artificial intelligence, degree distribution, machine learning, (18 more...)

arXiv.org Machine Learning

1810.01778

Country: Europe (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)

Add feedback