Goto

Collaborating Authors

 Bayesian Inference


Learning to Address Health Inequality in the United States with a Bayesian Decision Network

arXiv.org Machine Learning

Life-expectancy is a complex outcome driven by genetic, socio-demographic, environmental and geographic factors. Increasing socio-economic and health disparities in the United States are propagating the longevity-gap, making it a cause for concern. Earlier studies have probed individual factors but an integrated picture to reveal quantifiable actions has been missing. Amidst growing concerns about the further widening of healthcare inequality and differential access created by Artificial Intelligence, it is imperative to explore it's potential for illuminating biases and enabling transparent policy decisions. In this work, we reveal actionable interventions for decreasing the longevity-gap in the United States by analyzing a County-level data resource with healthcare, socio-economic, behavioral, education and demographic features. We learn an ensemble-averaged structure, draw inferences using the joint probability distribution and extend it to a Bayesian Decision Network for identifying policy actions. We draw quantitative estimates for the positive roles of diversity, preventive-care quality and stable-families within the unified framework of our decision network. Finally, we make this analysis and dashboard available as an interactive web-application for enabling users and policy-makers to validate our insights on bridging the longevity-gap and explore the ones beyond reported in this work.


Solving for multi-class: a survey and synthesis

arXiv.org Machine Learning

We review common methods of solving for multi-class from binary and generalize them to a common framework. Since conditional probabilties are useful both for quantifying the accuracy of an estimate and for calibration purposes, these are a required part of the solution. There is some indication that the best solution for multi-class classification is dependent on the particular dataset. As such, we are particularly interested in data-driven solution design, whether based on a priori considerations or empirical examination of the data. Numerical results indicate that while a one-size-fits-all solution consisting of one-versus-one is appropriate for most datasets, a minority will benefit from a more customized approach. The techniques discussed in this paper allow for a large variety of multi-class configurations and solution methods to be explored so as to optimize classification accuracy, accuracy of conditional probabilities and speed.


Systems of bounded rational agents with information-theoretic constraints

arXiv.org Artificial Intelligence

Specialization and hierarchical organization are important features of efficient collaboration in economical, artificial, and biological systems. Here, we investigate the hypothesis that both features can be explained by the fact that each entity of such a system is limited in a certain way. We propose an information-theoretic approach based on a Free Energy principle, in order to computationally analyze systems of bounded rational agents that deal with such limitations optimally. We find that specialization allows to focus on fewer tasks, thus leading to a more efficient execution, but in turn requires coordination in hierarchical structures of specialized experts and coordinating units. Our results suggest that hierarchical architectures of specialized units at lower levels that are coordinated by units at higher levels are optimal, given that each unit's information-processing capability is limited and conforms to constraints on complexity costs.


Alternate Estimation of a Classifier and the Class-Prior from Positive and Unlabeled Data

arXiv.org Machine Learning

We consider the problem of learning a binary classifier only from positive data and unlabeled data (PU learning). This problem arises in various practical situations, such as information retrieval and outlier detection (Elkan and Noto, 2008; Ward et al., 2009; Scott and Blanchard, 2009; Blanchard et al., 2010; Li et al., 2009; Nguyen et al., 2011). One of the theoretical milestones of PU learning is Elkan and Noto (2008) and there are subsequent researches called unbiased PU learning (du Plessis and Sugiyama, 2014; du Plessis et al., 2015), where the classification risk is estimated in an unbiased manner only from PU data. We consider the case-control scenario (Ward et al., 2009; Elkan and Noto, 2008), where positive data are obtained separately from unlabeled data and unlabeled data is sampled from the whole population. Under this setting, the true class-prior π p(y 1) in unlabeled data is needed for the formulation of unbiased PU learning.


Modelling Latent Travel Behaviour Characteristics with Generative Machine Learning

arXiv.org Machine Learning

The increased use of psychological and perceptual variables in travel choice survey have motivated a number of studies that investigated the explicit effects of latent behaviour in decision-making. Analysis of travel mode choice has focused on the effects of modal travel cost, time or reliability and many recent studies have attributed latent behaviour variables to account for unobservable effects Paulssen et al. [2014], Bhat et al. [2015]. The Integrated Choice and Latent Variable (ICLV) model is a recent development in structural equation modelling (SEM) to handle hybrid endogenous and exogenous variables in decision-making Ben-Akiva et al. [2002]. The ICLV model has been shown - in some situations - to produce consistent estimates of model parameters, leading to better explanatory solutions Vij and Walker [2016]. The history of structural modelling dates back to the 1970s and have been originally used in psychology, sociology and market research, and recently it has seen growing applications in travel behaviour involving latent preference "attitudinal" variables and measurement "indicators".


Cluster Variational Approximations for Structure Learning of Continuous-Time Bayesian Networks from Incomplete Data

arXiv.org Machine Learning

Continuous-time Bayesian networks (CTBNs) constitute a general and powerful framework for modeling continuous-time stochastic processes on networks. This makes them particularly attractive for learning the directed structures among interacting entities. However, if the available data is incomplete, one needs to simulate the prohibitively complex CTBN dynamics. Existing approximation techniques, such as sampling and low-order variational methods, either scale unfavorably in system size, or are unsatisfactory in terms of accuracy. Inspired by recent advances in statistical physics, we present a new approximation scheme based on cluster-variational methods significantly improving upon existing variational approximations. We can analytically marginalize the parameters of the approximate CTBN, as these are of secondary importance for structure learning. This recovers a scalable scheme for direct structure learning from incomplete and noisy time-series data. Our approach outperforms existing methods in terms of scalability.


Bayesian Structure Learning by Recursive Bootstrap

arXiv.org Machine Learning

We address the problem of Bayesian structure learning for domains with hundreds of variables by employing non-parametric bootstrap, recursively. We propose a method that covers both model averaging and model selection in the same framework. The proposed method deals with the main weakness of constraint-based learning---sensitivity to errors in the independence tests---by a novel way of combining bootstrap with constraint-based learning. Essentially, we provide an algorithm for learning a tree, in which each node represents a scored CPDAG for a subset of variables and the level of the node corresponds to the maximal order of conditional independencies that are encoded in the graph. As higher order independencies are tested in deeper recursive calls, they benefit from more bootstrap samples, and therefore more resistant to the curse-of-dimensionality. Moreover, the re-use of stable low order independencies allows greater computational efficiency. We also provide an algorithm for sampling CPDAGs efficiently from their posterior given the learned tree. We empirically demonstrate that the proposed algorithm scales well to hundreds of variables, and learns better MAP models and more reliable causal relationships between variables, than other state-of-the-art-methods.


The Inductive Bias of Restricted f-GANs

arXiv.org Machine Learning

Generative adversarial networks are a novel method for statistical inference that have achieved much empirical success; however, the factors contributing to this success remain ill-understood. In this work, we attempt to analyze generative adversarial learning -- that is, statistical inference as the result of a game between a generator and a discriminator -- with the view of understanding how it differs from classical statistical inference solutions such as maximum likelihood inference and the method of moments. Specifically, we provide a theoretical characterization of the distribution inferred by a simple form of generative adversarial learning called restricted f-GANs -- where the discriminator is a function in a given function class, the distribution induced by the generator is restricted to lie in a pre-specified distribution class and the objective is similar to a variational form of the f-divergence. A consequence of our result is that for linear KL-GANs -- that is, when the discriminator is a linear function over some feature space and f corresponds to the KL-divergence -- the distribution induced by the optimal generator is neither the maximum likelihood nor the method of moments solution, but an interesting combination of both.


Bayesian sparse reconstruction: a brute-force approach to astronomical imaging and machine learning

arXiv.org Machine Learning

We present a principled Bayesian framework for signal reconstruction, in which the signal is modelled by basis functions whose number (and form, if required) is determined by the data themselves. This approach is based on a Bayesian interpretation of conventional sparse reconstruction and regularisation techniques, in which sparsity is imposed through priors via Bayesian model selection. We demonstrate our method for noisy 1- and 2-dimensional signals, including astronomical images. Furthermore, by using a product-space approach, the number and type of basis functions can be treated as integer parameters and their posterior distributions sampled directly. We show that order-of-magnitude increases in computational efficiency are possible from this technique compared to calculating the Bayesian evidences separately, and that further computational gains are possible using it in combination with dynamic nested sampling. Our approach can be readily applied to neural networks, where it allows the network architecture to be determined by the data in a principled Bayesian manner by treating the number of nodes and hidden layers as parameters.


Change-Point Detection on Hierarchical Circadian Models

arXiv.org Machine Learning

This paper addresses the problem of change-point detection on sequences of high-dimensional and heterogeneous observations, which also possess a periodic temporal structure. Due to the dimensionality problem, when the time between change-points is on the order of the dimension of the model parameters, drifts in the underlying distribution can be misidentified as changes. To overcome this limitation we assume that the observations lie in a lower dimensional manifold that admits a latent variable representation. In particular, we propose a hierarchical model that is computationally feasible, widely applicable to heterogeneous data and robust to missing instances. Additionally, to deal with the observations' periodic dependencies, we employ a circadian model where the data periodicity is captured by non-stationary covariance functions. We validate the proposed technique on synthetic examples and we demonstrate its utility in the detection of changes for human behavior characterization.