Goto

Collaborating Authors

 Directed Networks


Simulations evaluating resampling methods for causal discovery: ensemble performance and calibration

arXiv.org Machine Learning

Causal discovery can be a powerful tool for investigating causality when a system can be observed but is inaccessible to experiments in practice. Despite this, it is rarely used in any scientific or medical fields. One of the major hurdles preventing the field of causal discovery from having a larger impact is that it is difficult to determine when the output of a causal discovery method can be trusted in a real-world setting. Trust is especially critical when human health is on the line. In this paper, we report the results of a series of simulation studies investigating the performance of different resampling methods as indicators of confidence in discovered graph features. We found that subsampling and sampling with replacement both performed surprisingly well, suggesting that they can serve as grounds for confidence in graph features. We also found that the calibration of subsampling and sampling with replacement had different convergence properties, suggesting that one's choice of which to use should depend on the sample size.


Streamlined Variational Inference for Linear Mixed Models with Crossed Random Effects

arXiv.org Machine Learning

W e derive streamlined mean field variational Bayes algorithms for fitting linear mixed models with crossed random effects. In the most general situation, where the dimensions of the crossed groups are arbitrarily large, streamlining is hindered by lack of sparseness in the underlying least squares system. Because of this fact we also consider a hierarchy of relaxations of the mean field product restriction. The least stringent product restriction delivers a high degree of inferential accuracy . However, this accuracy must be mitigated against its higher storage and computing demands. Faster sparse storage and computing alternatives are also provided, but come with the price of diminished inferential accuracy . This article provides full algorithmic details of three variational inference strategies, presents detailed empirical results on their pros and cons and, thus, guides the users on their choice of variational inference approach depending on the problem size and computing resources. Keywords: Mean field variational Bayes; item response theory; Rasch analysis; scalable statistical methodology; sparse least squares systems.


On Tractable Computation of Expected Predictions

arXiv.org Artificial Intelligence

Computing expected predictions has many interesting applications in areas such as fairness, handling missing values, and data analysis. Unfortunately, computing expectations of a discriminative model with respect to a probability distribution defined by an arbitrary generative model has been proven to be hard in general. In fact, the task is intractable even for simple models such as logistic regression and a naive Bayes distribution. In this paper, we identify a pair of generative and discriminative models that enables tractable computation of expectations of the latter with respect to the former, as well as moments of any order, in case of regression. Specifically, we consider expressive probabilistic circuits with certain structural constraints that support tractable probabilistic inference. Moreover, we exploit the tractable computation of high-order moments to derive an algorithm to approximate the expectations, for classification scenarios in which exact computations are intractable. We evaluate the effectiveness of our exact and approximate algorithms in handling missing data during prediction time where they prove to be competitive to standard imputation techniques on a variety of datasets. Finally, we illustrate how expected prediction framework can be used to reason about the behaviour of discriminative models.


Probability for Machine Learning (7-Day Mini-Course)

#artificialintelligence

Probability is a field of mathematics that is universally agreed to be the bedrock for machine learning. Although probability is a large field with many esoteric theories and findings, the nuts and bolts, tools and notations taken from the field are required for machine learning practitioners. With a solid foundation of what probability is, it is possible to focus on just the good or relevant parts. In this crash course, you will discover how you can get started and confidently understand and implement probabilistic methods used in machine learning with Python in seven days. This is a big and important post. You might want to bookmark it. Probability for Machine Learning (7-Day Mini-Course) Photo by Percita, some rights reserved.



Probability for Machine Learning

#artificialintelligence

This book was designed around major ideas and methods that are directly relevant to machine learning algorithms. There are a lot of things you could learn about probability, from theory to abstract concepts to APIs. My goal is to take you straight to developing an intuition for the elements you must understand with laser-focused tutorials. I designed the tutorials to focus on how to get things done with probability. They give you the tools to both rapidly understand and apply each technique or operation. Each tutorial is designed to take you less than one hour to read through and complete, excluding the extensions and further reading. You can choose to work through the lessons one per day, one per week, or at your own pace. I think momentum is critically important, and this book is intended to be read and used, not to sit idle. I would recommend picking a schedule and sticking to it.


High Mutual Information in Representation Learning with Symmetric Variational Inference

arXiv.org Machine Learning

We introduce the Mutual Information Machine (MIM), a novel formulation of representation learning, using a joint distribution over the observations and latent state in an encoder/decoder framework. Our key principles are symmetry and mutual information, where symmetry encourages the encoder and decoder to learn different factorizations of the same underlying distribution, and mutual information, to encourage the learning of useful representations for downstream tasks. Our starting point is the symmetric Jensen-Shannon divergence between the encoding and decoding joint distributions, plus a mutual information encouraging regularizer. We show that this can be bounded by a tractable cross entropy loss function between the true model and a parameterized approximation, and relate this to the maximum likelihood framework. We also relate MIM to variational autoencoders (VAEs) and demonstrate that MIM is capable of learning symmetric factorizations, with high mutual information that avoids posterior collapse.


Automatic Classification of Sexual Harassment Cases

#artificialintelligence

In our case, the data was provided by Safecity India, which is a platform launched on 2012, that crowdsources personal stories of sexual harassment and abuse in public spaces [2]. They have collected over 10,000 stories from over 50 cities in India, Kenya, Cameroon, and Nepal. More specifically they provided us a .cvs Additionally to the focal tasks of this project and as part of the NLP channel we decided to automate the category classification based on the sexual harassment case descriptions. Performing this classification task manually is time-consuming and leaving it entirely on the hands of the victim could produce ambiguity in the discrimination of the categories.


Learning Neural Causal Models from Unknown Interventions

arXiv.org Artificial Intelligence

Meta-learning over a set of distributions can be interpreted as learning different types of parameters corresponding to short-term vs long-term aspects of the mechanisms underlying the generation of data. These are respectively captured by quickly-changing parameters and slowly-changing meta-parameters. We present a new framework for meta-learning causal models where the relationship between each variable and its parents is modeled by a neural network, modulated by structural meta-parameters which capture the overall topology of a directed graphical model. Our approach avoids a discrete search over models in favour of a continuous optimization procedure. We study a setting where interventional distributions are induced as a result of a random intervention on a single unknown variable of an unknown ground truth causal model, and the observations arising after such an intervention constitute one meta-example. To disentangle the slow-changing aspects of each conditional from the fast-changing adaptations to each intervention, we parametrize the neural network into fast parameters and slow meta-parameters. We introduce a meta-learning objective that favours solutions robust to frequent but sparse interventional distribution change, and which generalize well to previously unseen interventions. Optimizing this objective is shown experimentally to recover the structure of the causal graph.


An Introduction to Probabilistic Spiking Neural Networks

arXiv.org Machine Learning

Spiking neural networks (SNNs) are distributed trainable systems whose computing elements, or neurons, are characterized by internal analog dynamics and by digital and sparse synaptic communications. The sparsity of the synaptic spiking inputs and the corresponding event-driven nature of neural processing can be leveraged by energy-efficient hardware implementations, which can offer significant energy reductions as compared to conventional artificial neural networks (ANNs). The design of training algorithms lags behind the hardware implementations. Most existing training algorithms for SNNs have been designed either for biological plausibility or through conversion from pretrained ANNs via rate encoding. This article provides an introduction to SNNs by focusing on a probabilistic signal processing methodology that enables the direct derivation of learning rules by leveraging the unique time-encoding capabilities of SNNs. We adopt discrete-time probabilistic models for networked spiking neurons and derive supervised and unsupervised learning rules from first principles via variational inference. Examples and open research problems are also provided.