AITopics | Bayesian Learning

Collaborating Authors

Bayesian Learning

A Bayesian network, Bayes network, belief network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Mathematics behind Machine Learning – The Concepts you Need to Know

#artificialintelligenceOct-15-2019, 05:53:54 GMT

We can easily use the widely available libraries available in Python and R to build models!" I have lost count of the number of times I've heard this from amateur data scientists. This fallacy is all too common and has created a false expectation among aspiring data science professionals. Let's get this out of the way right now – you need to understand the mathematics behind machine learning algorithms to become a data scientist. There is no way around it. It is an intrinsic part of a data scientist's role and every recruiter and experienced machine learning professional will vouch for this.

data science, machine learning, mathematics, (11 more...)

#artificialintelligence

Industry: Education (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions

Buesing, Lars, Heess, Nicolas, Weber, Theophane

arXiv.org Artificial IntelligenceOct-15-2019

A plethora of problems in AI, engineering and the sciences are naturally formalized as inference in discrete probabilistic models. Exact inference is often prohibitively expensive, as it may require evaluating the (unnormalized) target density on its entire domain. Here we consider the setting where only a limited budget of calls to the unnormalized density oracle is available, raising the challenge of where in the domain to allocate these function calls in order to construct a good approximate solution. We formulate this problem as an instance of sequential decision-making under uncertainty and leverage methods from reinforcement learning for probabilistic inference with budget constraints. In particular, we propose the TreeSample algorithm, an adaptation of Monte Carlo Tree Search to approximate inference. This algorithm caches all previous queries to the density oracle in an explicit search tree, and dynamically allocates new queries based on a "best-first" heuristic for exploration, using existing upper confidence bound methods. Our non-parametric inference method can be effectively combined with neural networks that compile approximate conditionals of the target, which are then used to guide the inference search and enable generalization across multiple target distributions. We show empirically that TreeSample outperforms standard approximate inference methods on synthetic factor graphs.

artificial intelligence, bayesian inference, inference, (18 more...)

arXiv.org Artificial Intelligence

1910.06862

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Learning Sample-Specific Models with Low-Rank Personalized Regression

Lengerich, Benjamin, Aragam, Bryon, Xing, Eric P.

arXiv.org Machine LearningOct-15-2019

Modern applications of machine learning (ML) deal with increasingly heterogeneous datasets comprised of data collected from overlapping latent subpopulations. As a result, traditional models trained over large datasets may fail to recognize highly predictive localized effects in favour of weakly predictive global patterns. This is a problem because localized effects are critical to developing individualized policies and treatment plans in applications ranging from precision medicine to advertising. To address this challenge, we propose to estimate sample-specific models that tailor inference and prediction at the individual level. In contrast to classical ML models that estimate a single, complex model (or only a few complex models), our approach produces a model personalized to each sample. These sample-specific models can be studied to understand subgroup dynamics that go beyond coarse-grained class labels. Crucially, our approach does not assume that relationships between samples (e.g. a similarity network) are known a priori. Instead, we use unmodeled covariates to learn a latent distance metric over the samples. We apply this approach to financial, biomedical, and electoral data as well as simulated data and show that sample-specific models provide fine-grained interpretations of complicated phenomena without sacrificing predictive accuracy compared to state-of-the-art models such as deep neural networks.

dataset, prediction, regression, (16 more...)

arXiv.org Machine Learning

1910.06939

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Indiana (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(3 more...)

Genre: Research Report > New Finding (0.47)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Government > Voting & Elections (1.00)
Banking & Finance (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Extracting robust and accurate features via a robust information bottleneck

Pensia, Ankit, Jog, Varun, Loh, Po-Ling

arXiv.org Machine LearningOct-15-2019

We propose a novel strategy for extracting features in supervised learning that can be used to construct a classifier which is more robust to small perturbations in the input space. Our method builds upon the idea of the information bottleneck by introducing an additional penalty term that encourages the Fisher information of the extracted features to be small, when parametrized by the inputs. By tuning the regularization parameter, we can explicitly trade off the opposing desiderata of robustness and accuracy when constructing a classifier. We derive the optimal solution to the robust information bottleneck when the inputs and outputs are jointly Gaussian, proving that the optimally robust features are also jointly Gaussian in that setting. Furthermore, we propose a method for optimizing a variational bound on the robust information bottleneck objective in general settings using stochastic gradient descent, which may be implemented efficiently in neural networks. Our experimental results for synthetic and real data sets show that the proposed feature extraction method indeed produces classifiers with increased robustness to perturbations.

inequality, information, robustness, (16 more...)

arXiv.org Machine Learning

1910.06893

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
(2 more...)

Add feedback

Counterfactual diagnosis

Richens, Jonathan G., Lee, Ciaran M., Johri, Saurabh

arXiv.org Artificial IntelligenceOct-15-2019

Causal knowledge is vital for effective reasoning in science and medicine. In medical diagnosis for example, a doctor aims to explain a patient's symptoms by determining the diseases causing them. However, all previous approaches to Machine-Learning assisted diagnosis, including Deep Learning and model-based Bayesian approaches, learn by association and do not distinguish correlation from causation. Here, we propose a new diagnostic algorithm based on counterfactual inference which captures the causal aspect of diagnosis overlooked by previous approaches. Using a statistical disease model, which describes the relations between hundreds of diseases, symptoms and risk factors, we compare our counterfactual algorithm to the standard Bayesian diagnostic algorithm, and test these against a cohort of 44 doctors. We use 1763 clinical vignettes created by a separate panel of doctors to benchmark performance. Each vignette provides a non-exhaustive list of symptoms and medical history simulating a single presentation of a disease. The algorithms and doctors are tasked with determining the underlying disease for each vignette from symptom and medical history information alone. While the Bayesian algorithm achieves the accuracy comparable to the average doctor, placing in the top 49\% of doctors in our cohort, our counterfactual algorithm places in the top 20\% of doctors, achieving expert clinical accuracy. Our results demonstrate the advantage of counterfactual over associative reasoning in a complex real-world task, and show that counterfactual reasoning is a vital missing ingredient for applying machine learning to medical diagnosis.

algorithm, diagnosis, symptom, (17 more...)

arXiv.org Artificial Intelligence

1910.06772

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
North America > Greenland (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Bayesian Temporal Factorization for Multidimensional Time Series Prediction

Sun, Lijun, Chen, Xinyu

arXiv.org Machine LearningOct-14-2019

Abstract--Large-scale and multidimensional spatiotemporal data sets are becoming ubiquitous in many real-world applications such as monitoring urban traffic and air quality . Making predictions on these time series has become a critical challenge due to not only the large-scale and high-dimensional nature but also the considerable amount of missing data. In this paper, we propose a Bayesian temporal factorization (BTF) framework for modeling multidimensional time series--in particular spatiotemporal data--in the presence of missing values. By integrating low-rank matrix/tensor factorization and vector autoregressive (VAR) process into a single probabilistic graphical model, this framework can characterize both global and local consistencies in large-scale time series data. The graphical model allows us to effectively perform probabilistic predictions and produce uncertainty estimates without imputing those missing values. We develop efficient Gibbs sampling algorithms for model inference and test the proposed BTF framework on several real-world spatiotemporal data sets for both missing data imputation and short-term/long-term rolling prediction tasks. The numerical experiments demonstrate the superiority of the proposed BTF approaches over many state-of-the-art techniques. With recent advances in sensing technologies, large-scale and multidimensional time series data--in particular spatiotemporal data--are collected on a continuous basis from various types of sensors and applications. Making predictions on these time series, such as forecasting urban traffic states and regional air quality, serves as a foundation to many real-world applications and benefits many scientific fields [1], [2]. For example, predicting the demand and states (e.g., speed, flow) of urban traffic is essential to a wide range of intelligent transportation systems (ITS) applications, such trip planning, travel time estimation, route planning, traffic signal control, to name but a few [3]. However, given the complex spatiotemporal dependencies in these data sets, making efficient and reliable predictions for real-time applications has been a longstanding and fundamental research challenge. Despite the vast body of literature on time series analysis from many scientific areas, three emerging issues in modern sensing technologies are constantly challenging the classical modeling frameworks. First, modern time series data are often large-scale, collected from a large number of subjects/locations/sensors simultaneously .

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

1910.06366

Country:

North America > Canada > Quebec > Montreal (0.14)
Asia > China > Guangdong Province > Guangzhou (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
(4 more...)

Genre: Research Report > Promising Solution (0.48)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

MIM: Mutual Information Machine

Livne, Micha, Swersky, Kevin, Fleet, David J.

arXiv.org Machine LearningOct-14-2019

We introduce the Mutual Information Machine (MIM), an autoencoder model for learning joint distributions over observations and latent states. The model formulation reflects two key design principles: 1) symmetry, to encourage the encoder and decoder to learn consistent factorizations of the same underlying distribution; and 2) mutual information, to encourage the learning of useful representations for downstream tasks. The objective comprises the Jensen-Shannon divergence between the encoding and decoding joint distributions, plus a mutual information term. We show that this objective can be bounded by a tractable cross-entropy loss between the true model and a parameterized approximation, and relate this to maximum likelihood estimation and variational autoencoders. Experiments show that MIM is capable of learning a latent representation with high mutual information, and good unsupervised clustering, while providing data log likelihood comparable to VAE (with a sufficiently expressive architecture).

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

1910.03175

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

PROFET: Construction and Inference of DBNs Based on Mathematical Models

Ajmal, Hamda, Madden, Michael, Enright, Catherine

arXiv.org Machine LearningOct-14-2019

PROFET: Construction and Inference of DBNs Based on Mathematical Models Hamda Ajmal, Michael Madden and Catherine Enright School of Computer Science, National University of Ireland Galway h.ajmal1@nuigalway.ie, Abstract This paper presents, evaluates, and discusses a new software tool to automatically build Dynamic Bayesian Networks (DBNs) from ordinary differential equations (ODEs) entered by the user. The DBNs generated from ODE models can handle both data uncertainty and model uncertainty in a principled manner. The application, named PROFET, can be used for temporal data mining with noisy or missing variables. It enables automatic re-estimation of model parameters using temporal evidence in the form of data streams. For temporal inference, PROFET includes both standard fixed time step particle filtering and its extension, adaptive-time particle filtering algorithms. Adaptive-time particle filtering enables the DBN to automatically adapt its time step length to match the dynamics of the model. We demonstrate PROFET's functionality by using it to infer the model variables by estimating the model parameters of four benchmark ODE systems. From the generation of the DBN model to temporal inference, the entire process is automated and is delivered as an open-source platform-independent software application with a comprehensive user interface. PROFET is released under the Apache License 2.0. Its source code, executable and documentation are available at http:://profet.

inference, model parameter, ode model, (16 more...)

arXiv.org Machine Learning

1910.04895

Country:

Europe > Ireland (0.24)
Europe > Czechia > Prague (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Notes on Lipschitz Margin, Lipschitz Margin Training, and Lipschitz Margin p-Values for Deep Neural Network Classifiers

Kesidis, George, Miller, David J.

arXiv.org Machine LearningOct-14-2019

A variety of papers have been recently produced on "robustifying " Deep Neural Networks (DNNs), particularly to adversarial Test-Time Evasion (TTE) attacks [14, 15, 13]. We discuss some of this work in Sections III.A and IV.A of [9 ] and argue for the need for TTE-attack detection [8] for robustness . In this note, we derive a local class purity result under the assumption of Lipschitz continuity, discuss Lipschitz margin training, and define an associated p-value. Estimation of the Lipschitz parameter for a given DNN is disc ussed in, e.g., [12, 14, 16, 4].

dnn, lipschitz margin, training dataset, (9 more...)

arXiv.org Machine Learning

1910.08032

Country:

Europe > Denmark > North Jutland > Aalborg (0.05)
North America > United States > Pennsylvania > Centre County > University Park (0.04)

Genre: Research Report (0.75)

Industry:

Information Technology > Security & Privacy (0.35)
Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)

Add feedback

Introducing an Explicit Symplectic Integration Scheme for Riemannian Manifold Hamiltonian Monte Carlo

Cobb, Adam D., Baydin, Atılım Güneş, Markham, Andrew, Roberts, Stephen J.

arXiv.org Machine LearningOct-14-2019

We introduce a recent symplectic integration scheme derived for solving physically motivated systems with non-separable Hamiltonians. We show its relevance to Riemannian manifold Hamiltonian Monte Carlo (RMHMC) and provide an alternative to the currently used generalised leapfrog symplectic integrator, which relies on solving multiple fixed point iterations to convergence. Via this approach, we are able to reduce the number of higher-order derivative calculations per leapfrog step. We explore the implications of this integrator and demonstrate its efficacy in reducing the computational burden of RMHMC. Our code is provided in a new open-source Python package, hamiltorch.

integrator, monte carlo, rmhmc, (12 more...)

arXiv.org Machine Learning

1910.06243

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback