Bayesian Learning
Playing against Nature: causal discovery for decision making under uncertainty
Gonzalez-Soto, M., Sucar, L. E., Escalante, H. J.
We consider decision problems under uncertainty where the options available to a decision maker and the resulting outcome are related through a causal mechanism which is unknown to the decision maker. We ask how a decision maker can learn about this causal mechanism through sequential decision making as well as using current causal knowledge inside each round in order to make better choices had she not considered causal knowledge and propose a decision making procedure in which an agent holds \textit{beliefs} about her environment which are used to make a choice and are updated using the observed outcome. As proof of concept, we present an implementation of this causal decision making model and apply it in a simple scenario. We show that the model achieves a performance similar to the classic Q-learning while it also acquires a causal model of the environment.
Diagonal Discriminant Analysis with Feature Selection for High Dimensional Data
Romanes, Sarah Elizabeth, Ormerod, John Thomas, Yang, Jean YH
Classification problems involving high dimensional data are extensive in many fields such as finance, marketing, and bioinformatics. Unique challenges with high dimensional datasets are numerous and well known, with many classifiers built under traditional low dimensional frameworks simply unable to be applied to such high dimensional data. Discriminant Analysis (DA) is one such example (Fisher, 1936). DA classifiers work by assuming the distribution of the features is strictly Gaussian at the class level, and assign a particular point to the class label which minimises the Mahalanobis (for linear discriminant analysis (LDA)) distance between that point and the mean of the multivariate normal corresponding to such class. Although extraordinary simple and easy to use in low dimensional settings, DA is well known to be unusable in high dimensions due to the maximum likelihood estimate of the corresponding covariance matrix being singular when the number of features is greater than that of the observations.
Logical Explanations for Deep Relational Machines Using Relevance Information
Srinivasan, Ashwin, Vig, Lovekesh, Bain, Michael
Our interest in this paper is in the construction of symbolic explanations for predictions made by a deep neural network. We will focus attention on deep relational machines (DRMs, first proposed by H. Lodhi). A DRM is a deep network in which the input layer consists of Boolean-valued functions (features) that are defined in terms of relations provided as domain, or background, knowledge. Our DRMs differ from those proposed by Lodhi, which use an Inductive Logic Programming (ILP) engine to first select features (we use random selections from a space of features that satisfies some approximate constraints on logical relevance and non-redundancy). But why do the DRMs predict what they do? One way of answering this is the LIME setting, in which readable proxies for a black-box predictor. The proxies are intended only to model the predictions of the black-box in local regions of the instance-space. But readability alone may not enough: to be understandable, the local models must use relevant concepts in an meaningful manner. We investigate the use of a Bayes-like approach to identify logical proxies for local predictions of a DRM. We show: (a) DRM's with our randomised propositionalization method achieve state-of-the-art predictive performance; (b) Models in first-order logic can approximate the DRM's prediction closely in a small local region; and (c) Expert-provided relevance information can play the role of a prior to distinguish between logical explanations that perform equivalently on prediction alone.
Inference, Learning, and Population Size: Projectivity for SRL Models
Jaeger, Manfred, Schulte, Oliver
A subtle difference between propositional and relational data is that in many relational models, marginal probabilities depend on the population or domain size. This paper connects the dependence on population size to the classic notion of projectivity from statistical theory: Projectivity implies that relational predictions are robust with respect to changes in domain size. We discuss projectivity for a number of common SRL systems, and identify syntactic fragments that are guaranteed to yield projective models. The syntactic conditions are restrictive, which suggests that projectivity is difficult to achieve in SRL, and care must be taken when working with different domain sizes.
Is The Variational Bayesian Method The Most Difficult Machine Learning Technique?
Data scientist Stefano Cosentino observed in a post that the Bayesian approach leans more towards the distributions associated with each parameter. For instance, he writes that the two parameters depicted below, as shown by the Gaussian curves after a trained Bayesian network has converged. Hence the Bayesian approach, where the parameters are unknown quantities can be considered as random variables. University of Buffalo's paper defines the Bayesian approach to uncertainty, which treats all uncertain quantities as random variables and uses the laws of probability to manipulate those uncertain quantities. Hence, the right Bayesian approach integrates over all uncertain quantities rather than optimise them, states the paper.
The Mathematics of Machine Learning - AI Trends
In the last few months, I have had several people contact me about their enthusiasm for venturing into the world of data science and using Machine Learning (ML) techniques to probe statistical regularities and build impeccable data-driven products. However, I've observed that some actually lack the necessary mathematical intuition and framework to get useful results. This is the main reason I decided to write this blog post. Recently, there has been an upsurge in the availability of many easy-to-use machine and deep learning packages such as scikit-learn, Weka, Tensorflow etc. Machine Learning theory is a field that intersects statistical, probabilistic, computer science and algorithmic aspects arising from learning iteratively from data and finding hidden insights which can be used to build intelligent applications. Despite the immense possibilities of Machine and Deep Learning, a thorough mathematical understanding of many of these techniques is necessary for a good grasp of the inner workings of the algorithms and getting good results.
New Heuristics for Parallel and Scalable Bayesian Optimization
Bayesian optimization has emerged as a strong candidate tool for global optimization of functions with expensive evaluation costs. However, due to the dynamic nature of research in Bayesian approaches, and the evolution of computing technology, using Bayesian optimization in a parallel computing environment remains a challenge for the non-expert. In this report, I review the state-of-the-art in parallel and scalable Bayesian optimization methods. In addition, I propose practical ways to avoid a few of the pitfalls of Bayesian optimization, such as oversampling of edge parameters and over-exploitation of high performance parameters. Finally, I provide relatively simple, heuristic algorithms, along with their open source software implementations, that can be immediately and easily deployed in any computing environment.
Model-based Exception Mining for Object-Relational Data
Riahi, Fatemeh, Schulte, Oliver
This paper is based on a previous publication [29]. Our work extends exception mining and outlier detection to the case of object-relational data. Object-relational data represent a complex heterogeneous network [12], which comprises objects of different types, links among these objects, also of different types, and attributes of these links. This special structure prohibits a direct vectorial data representation. We follow the well-established Exceptional Model Mining framework, which leverages machine learning models for exception mining: A object is exceptional to the extent that a model learned for the object data differs from a model learned for the general population. Exceptional objects can be viewed as outliers. We apply state of-the-art probabilistic modelling techniques for object-relational data that construct a graphical model (Bayesian network), which compactly represents probabilistic associations in the data. A new metric, derived from the learned object-relational model, quantifies the extent to which the individual association pattern of a potential outlier deviates from that of the whole population. The metric is based on the likelihood ratio of two parameter vectors: One that represents the population associations, and another that represents the individual associations. Our method is validated on synthetic datasets and on real-world data sets about soccer matches and movies. Compared to baseline methods, our novel transformed likelihood ratio achieved the best detection accuracy on all datasets.
Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities
Zitnik, Marinka, Nguyen, Francis, Wang, Bo, Leskovec, Jure, Goldenberg, Anna, Hoffman, Michael M.
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.
Accurate Uncertainties for Deep Learning Using Calibrated Regression
Kuleshov, Volodymyr, Fenner, Nathan, Ermon, Stefano
Methods for reasoning under uncertainty are a key building block of accurate and reliable machine learning systems. Bayesian methods provide a general framework to quantify uncertainty. However, because of model misspecification and the use of approximate inference, Bayesian uncertainty estimates are often inaccurate -- for example, a 90% credible interval may not contain the true outcome 90% of the time. Here, we propose a simple procedure for calibrating any regression algorithm; when applied to Bayesian and probabilistic models, it is guaranteed to produce calibrated uncertainty estimates given enough data. Our procedure is inspired by Platt scaling and extends previous work on classification. We evaluate this approach on Bayesian linear regression, feedforward, and recurrent neural networks, and find that it consistently outputs well-calibrated credible intervals while improving performance on time series forecasting and model-based reinforcement learning tasks.