AITopics | Bayesian Learning

Collaborating Authors

Bayesian Learning

A Bayesian network, Bayes network, belief network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Parsimonious Bayesian deep networks

Zhou, Mingyuan

arXiv.org Machine LearningMay-22-2018

Combining Bayesian nonparametrics and a forward model selection strategy, we construct parsimonious Bayesian deep networks (PBDNs) that infer capacity-regularized network architectures from the data and require neither cross-validation nor fine-tuning when training the model. One of the two essential components of a PBDN is the development of a special infinite-wide single-hidden-layer neural network, whose number of active hidden units can be inferred from the data. The other one is the construction of a greedy layer-wise learning algorithm that uses a forward model selection criterion to determine when to stop adding another hidden layer. We develop both Gibbs sampling and stochastic gradient descent based maximum a posteriori inference for PBDNs, providing state-of-the-art classification accuracy and interpretable data subtypes near the decision boundaries, while maintaining low computational complexity for out-of-sample prediction.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1805.08719

Country: North America > United States > Texas (0.28)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Amortized Inference Regularization

Shu, Rui, Bui, Hung H., Zhao, Shengjia, Kochenderfer, Mykel J., Ermon, Stefano

arXiv.org Artificial IntelligenceMay-22-2018

The variational autoencoder (VAE) is a popular model for density estimation and representation learning. Canonically, the variational principle suggests to prefer an expressive inference model so that the variational approximation is accurate. However, it is often overlooked that an overly-expressive inference model can be detrimental to the test set performance of both the amortized posterior approximator and, more importantly, the generative density estimator. In this paper, we leverage the fact that VAEs rely on amortized inference and propose techniques for amortized inference regularization (AIR) that control the smoothness of the inference model. We demonstrate that, by applying AIR, it is possible to improve VAE generalization on both inference and generative performance. Our paper challenges the belief that amortized inference is simply a mechanism for approximating maximum likelihood training and illustrates that regularization of the amortization family provides a new direction for understanding and improving generalization in VAEs.

artificial intelligence, inference model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1805.08913

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Add feedback

Multi-Statistic Approximate Bayesian Computation with Multi-Armed Bandits

Singh, Prashant, Hellander, Andreas

arXiv.org Machine LearningMay-22-2018

Approximate Bayesian computation is an established and popular method for likelihood-free inference with applications in many disciplines. The effectiveness of the method depends critically on the availability of well performing summary statistics. Summary statistic selection relies heavily on domain knowledge and carefully engineered features, and can be a laborious time consuming process. Since the method is sensitive to data dimensionality, the process of selecting summary statistics must balance the need to include informative statistics and the dimensionality of the feature vector. This paper proposes to treat the problem of dynamically selecting an appropriate summary statistic from a given pool of candidate summary statistics as a multi-armed bandit problem. This allows approximate Bayesian computation rejection sampling to dynamically focus on a distribution over well performing summary statistics as opposed to a fixed set of statistics. The proposed method is unique in that it does not require any pre-processing and is scalable to a large number of candidate statistics. This enables efficient use of a large library of possible time series summary statistics without prior feature engineering. The proposed approach is compared to state-of-the-art methods for summary statistics selection using a challenging test problem from the systems biology literature.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

1805.08647

Country: Europe > Sweden (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.84)

Add feedback

Conditional Network Embeddings

Kang, Bo, Lijffijt, Jefrey, De Bie, Tijl

arXiv.org Machine LearningMay-22-2018

Network embeddings map the nodes of a given network into $d$-dimensional Euclidean space $\mathbb{R}^d$. Ideally, this mapping is such that `similar' nodes are mapped onto nearby points, such that the embedding can be used for purposes such as link prediction (if `similar' means being `more likely to be connected') or classification (if `similar' means `being more likely to have the same label'). In recent years various methods for network embedding have been introduced. These methods all follow a similar strategy, defining a notion of similarity between nodes (typically deeming nodes more similar if they are nearby in the network in some metric), a distance measure in the embedding space, and minimizing a loss function that penalizes large distances for similar nodes or small distances for dissimilar nodes. A difficulty faced by existing methods is that certain networks are fundamentally hard to embed due to their structural properties, such as (approximate) multipartiteness, certain degree distributions, or certain kinds of assortativity. Overcoming this difficulty, we introduce a conceptual innovation to the literature on network embedding, proposing to create embeddings that maximally add information with respect to such structural properties (e.g. node degrees, block densities, etc.). We use a simple Bayesian approach to achieve this, and propose a block stochastic gradient descent algorithm for fitting it efficiently. Finally, we demonstrate that the combination of information such structural properties and a Euclidean embedding provides superior performance across a range of link prediction tasks. Moreover, we demonstrate the potential of our approach for network visualization.

data mining, machine learning, node, (18 more...)

arXiv.org Machine Learning

1805.07544

Country: North America > United States (0.28)

Genre: Research Report (0.83)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

A Gentle Introduction to Maximum Likelihood Estimation

@machinelearnbotMay-21-2018, 04:10:33 GMT

The first time I heard someone use the term maximum likelihood estimation, I went to Google and found out what it meant. Then I went to Wikipedia to find out what it really meant. To spare you the wrestling required to understand and incorporate MLE into your data science workflow, ethos, and projects, I've compiled this guide. This is funny (if you follow this strange domain of humor), and mostly right about the differences between the two camps. Not minding that our Sun going into nova is not really a repeatable experiment -- sorry, frequentists!

bayesian inference, machine learning, maximum likelihood estimation, (2 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

The Roles of Supervised Machine Learning in Systems Neuroscience

Glaser, Joshua I., Benjamin, Ari S., Farhoodi, Roozbeh, Kording, Konrad P.

arXiv.org Machine LearningMay-21-2018

Over the last several years, the use of machine learning (ML) in neuroscience has been increasing exponentially. Here, we review ML's contributions, both realized and potential, across several areas of systems neuroscience. We describe four primary roles of ML within neuroscience: 1) creating solutions to engineering problems, 2) identifying predictive variables, 3) setting benchmarks for simple models of the brain, and 4) serving itself as a model for the brain. The breadth and ease of its applicability suggests that machine learning should be in the toolbox of most systems neuroscientists.

artificial intelligence, machine learning, neural network, (13 more...)

arXiv.org Machine Learning

1805.08239

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Streaming MANN: A Streaming-Based Inference for Energy-Efficient Memory-Augmented Neural Networks

Park, Seongsik, Jang, Jaehee, Kim, Seijoon, Yoon, Sungroh

arXiv.org Machine LearningMay-21-2018

With the successful development of artificial intelligence using deep learning, there has been growing interest in its deployment. The mobile environment is the closest hardware platform to real life, and it has become an important platform for the success or failure of artificial intelligence. Memory-augmented neural networks (MANNs) are neural networks proposed to efficiently handle question-and-answer (Q&A) tasks, well-suited for mobile devices. As a MANN requires various types of operations and recurrent data paths, it is difficult to accelerate the inference in the structure designed for other conventional neural network models, which is one of the biggest obstacles to deploying MANNs in mobile environments. To address the aforementioned issues, we propose Streaming MANN. This is the first attempt to implement and demonstrate the architecture for energy-efficient inference of MANNs with the concept of streaming processing. To achieve the full potential of the streaming process, we propose a novel approach, called inference thresholding, using Bayesian approach considering the characteristics of natural language processing (NLP) tasks. To evaluate our proposed approaches, we implemented the architecture and method in a field-programmable gate array (FPGA) which is suitable for streaming processing. We measured the execution time and power consumption of the inference for the bAbI dataset. The experimental results showed that the performance efficiency per energy (FLOPS/kJ) of the Streaming MANN increased by a factor of up to about 126 compared to the results of NVIDIA TITAN V, and up to 140 if inference thresholding is applied.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

1805.07978

Genre: Research Report > New Finding (0.54)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting

Ritter, Hippolyt, Botev, Aleksandar, Barber, David

arXiv.org Machine LearningMay-20-2018

We introduce the Kronecker factored online Laplace approximation for overcoming catastrophic forgetting in neural networks. The method is grounded in a Bayesian online learning framework, where we recursively approximate the posterior after every task with a Gaussian, leading to a quadratic penalty on changes to the weights. The Laplace approximation requires calculating the Hessian around a mode, which is typically intractable for modern architectures. In order to make our method scalable, we leverage recent block-diagonal Kronecker factored approximations to the curvature. Our algorithm achieves over 90% test accuracy across a sequence of 50 instantiations of the permuted MNIST dataset, substantially outperforming related methods for overcoming catastrophic forgetting.

approximation, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1805.0781

Genre: Research Report (1.00)

Industry: Education (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Minimax Lower Bounds for Cost Sensitive Classification

Kamalaruban, Parameswaran, Williamson, Robert C.

arXiv.org Machine LearningMay-20-2018

The central problem of this paper is the cost-sensitive binary classification problem, where different costs are associated with different types of mistakes. Several important machine learning applications such as medical decision making, targeted marketing, and intrusion detection can be naturally formalized as costsensitive classification setup ([1]). In these domains, the cost of missing a target is much higher than that of a false-positive, and classifiers that do not take misclassification costs into account do not perform well. The cost-sensitive classification problem has been extensively studied, and people have developed efficient algorithms with provable guarantees on the (generalization) error [6, 9, 26, 27, 11, 4]. These methods primarily take existing classification methods based on empirical risk minimization and try to adapt them in various ways to be sensitive to these misclassification costs. Despite all these efforts, the understanding of the fundamental limits of this problem is still missing. In this paper, we study the hardness of this problem by obtaining minimax lower bounds. In particular, we are interested in understanding how the cost parameter influences the hardness or complexity of the cost-sensitive classification. Minimax Lower Bounds Understanding the hardness or fundamental limits of a learning problem is important for practice for the following reasons: - They give an estimate on the number of samples required for a good performance of a learning algorithm.

artificial intelligence, classification problem, machine learning, (14 more...)

arXiv.org Machine Learning

1805.07723

Genre: Research Report (0.50)

Industry:

Education (0.49)
Health & Medicine (0.48)
Law Enforcement & Public Safety (0.48)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.84)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
(2 more...)

Add feedback

Distributionally Robust Inverse Covariance Estimation: The Wasserstein Shrinkage Estimator

Nguyen, Viet Anh, Kuhn, Daniel, Esfahani, Peyman Mohajerin

arXiv.org Machine LearningMay-18-2018

We introduce a distributionally robust maximum likelihood estimation model with a Wasserstein ambiguity set to infer the inverse covariance matrix of a $p$-dimensional Gaussian random vector from $n$ independent samples. The proposed model minimizes the worst case (maximum) of Stein's loss across all normal reference distributions within a prescribed Wasserstein distance from the normal distribution characterized by the sample mean and the sample covariance matrix. We prove that this estimation problem is equivalent to a semidefinite program that is tractable in theory but beyond the reach of general purpose solvers for practically relevant problem dimensions $p$. In the absence of any prior structural information, the estimation problem has an analytical solution that is naturally interpreted as a nonlinear shrinkage estimator. Besides being invertible and well-conditioned even for $p>n$, the new shrinkage estimator is rotation-equivariant and preserves the order of the eigenvalues of the sample covariance matrix. These desirable properties are not imposed ad hoc but emerge naturally from the underlying distributionally robust optimization model. Finally, we develop a sequential quadratic approximation algorithm for efficiently solving the general estimation problem subject to conditional independence constraints typically encountered in Gaussian graphical models.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

1805.07194

Country:

Europe (0.67)
North America > United States (0.67)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.67)
Health & Medicine > Therapeutic Area > Hematology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.56)

Add feedback