Goto

Collaborating Authors

 Bayesian Inference


Exploration and Exploitation of Victorian Science in Darwin's Reading Notebooks

arXiv.org Artificial Intelligence

Search in an environment with an uncertain distribution of resources involves a trade-off between exploitation of past discoveries and further exploration. This extends to information foraging, where a knowledge-seeker shifts between reading in depth and studying new domains. To study this decision-making process, we examine the reading choices made by one of the most celebrated scientists of the modern era: Charles Darwin. From the full-text of books listed in his chronologically-organized reading journals, we generate topic models to quantify his local (text-to-text) and global (text-to-past) reading decisions using Kullback-Liebler Divergence, a cognitively-validated, information-theoretic measure of relative surprise. Rather than a pattern of surprise-minimization, corresponding to a pure exploitation strategy, Darwin's behavior shifts from early exploitation to later exploration, seeking unusually high levels of cognitive surprise relative to previous eras. These shifts, detected by an unsupervised Bayesian model, correlate with major intellectual epochs of his career as identified both by qualitative scholarship and Darwin's own self-commentary. Our methods allow us to compare his consumption of texts with their publication order. We find Darwin's consumption more exploratory than the culture's production, suggesting that underneath gradual societal changes are the explorations of individual synthesis and discovery. Our quantitative methods advance the study of cognitive search through a framework for testing interactions between individual and collective behavior and between short- and long-term consumption choices. This novel application of topic modeling to characterize individual reading complements widespread studies of collective scientific behavior.


Edward: A library for probabilistic modeling, inference, and criticism

arXiv.org Machine Learning

Probabilistic modeling is a powerful approach for analyzing empirical information. We describe Edward, a library for probabilistic modeling. Edward's design reflects an iterative process pioneered by George Box: build a model of a phenomenon, make inferences about the model given data, and criticize the model's fit to the data. Edward supports a broad class of probabilistic models, efficient algorithms for inference, and many techniques for model criticism. The library builds on top of TensorFlow to support distributed training and hardware such as GPUs. Edward enables the development of complex probabilistic models and their algorithms at a massive scale.


Margins of discrete Bayesian networks

arXiv.org Machine Learning

Bayesian network models with latent variables are widely used in statistics and machine learning. In this paper we provide a complete algebraic characterization of Bayesian network models with latent variables when the observed variables are discrete and no assumption is made about the state-space of the latent variables. We show that it is algebraically equivalent to the so-called nested Markov model, meaning that the two are the same up to inequality constraints on the joint probabilities. In particular these two models have the same dimension. The nested Markov model is therefore the best possible description of the latent variable model that avoids consideration of inequalities, which are extremely complicated in general. A consequence of this is that the constraint finding algorithm of Tian and Pearl (UAI 2002, pp519-527) is complete for finding equality constraints. Latent variable models suffer from difficulties of unidentifiable parameters and non-regular asymptotics; in contrast the nested Markov model is fully identifiable, represents a curved exponential family of known dimension, and can easily be fitted using an explicit parameterization.


3 security analytics approaches that don't work (but could) -- Part 1

#artificialintelligence

Bayesian probability theory states that it's possible to predict with surprising accuracy the likelihood of something happening (or not happening) in a transparent and analytically defensible way. A Bayesian inference network, or model, captures every element of a problem and calculates possible outcomes mathematically. The harder the problem, the better it works--at least in theory. In reality, a typical approach is to gather a roomful of PhDs and spend a lot of time and money building a Bayesian network. Then, with even greater effort and more man-hours, the Bayesian network is turned into software by a roomful of coders.


The Impact of Estimation: A New Method for Clustering and Trajectory Estimation in Patient Flow Modeling

arXiv.org Machine Learning

The ability to accurately forecast and control inpatient census, and thereby workloads, is a critical and longstanding problem in hospital management. Majority of current literature focuses on optimal scheduling of inpatients, but largely ignores the process of accurate estimation of the trajectory of patients throughout the treatment and recovery process. The result is that current scheduling models are optimizing based on inaccurate input data. We developed a Clustering and Scheduling Integrated (CSI) approach to capture patient flows through a network of hospital services. CSI functions by clustering patients into groups based on similarity of trajectory using a novel Semi-Markov model (SMM)-based clustering scheme proposed in this paper, as opposed to clustering by admit type or condition as in previous literature. The methodology is validated by simulation and then applied to real patient data from a partner hospital where we see it outperforms current methods. Further, we demonstrate that extant optimization methods achieve significantly better results on key hospital performance measures under CSI, compared with traditional estimation approaches, increasing elective admissions by 97% and utilization by 22% compared to 30% and 8% using traditional estimation techniques. From a theoretical standpoint, the SMM-clustering is a novel approach applicable to any temporal-spatial stochastic data that is prevalent in many industries and application areas.


Applications of Bayes' Theorem โ€ข /r/artificial

#artificialintelligence

How is Bayes' Theorem used in artificial intelligence and machine learning? Is there any good book that you can recommend? As an high school student I will be writing an essay about it, and I want to use the best sources that I can find. I need a source that explains bayes' theorem, its general use and how it is used in AI or ML?


Bayesian Learning of Consumer Preferences for Residential Demand Response

arXiv.org Machine Learning

In coming years residential consumers will face real-time electricity tariffs with energy prices varying day to day, and effective energy saving will require automation - a recommender system, which learns consumer's preferences from her actions. A consumer chooses a scenario of home appliance use to balance her comfort level and the energy bill. We propose a Bayesian learning algorithm to estimate the comfort level function from the history of appliance use. In numeric experiments with datasets generated from a simulation model of a consumer interacting with small home appliances the algorithm outperforms popular regression analysis tools. Our approach can be extended to control an air heating and conditioning system, which is responsible for up to half of a household's energy bill.


Modelling Competitive Sports: Bradley-Terry-\'{E}l\H{o} Models for Supervised and On-Line Learning of Paired Competition Outcomes

arXiv.org Machine Learning

Prediction and modelling of competitive sports outcomes has received much recent attention, especially from the Bayesian statistics and machine learning communities. In the real world setting of outcome prediction, the seminal \'{E}l\H{o} update still remains, after more than 50 years, a valuable baseline which is difficult to improve upon, though in its original form it is a heuristic and not a proper statistical "model". Mathematically, the \'{E}l\H{o} rating system is very closely related to the Bradley-Terry models, which are usually used in an explanatory fashion rather than in a predictive supervised or on-line learning setting. Exploiting this close link between these two model classes and some newly observed similarities, we propose a new supervised learning framework with close similarities to logistic regression, low-rank matrix completion and neural networks. Building on it, we formulate a class of structured log-odds models, unifying the desirable properties found in the above: supervised probabilistic prediction of scores and wins/draws/losses, batch/epoch and on-line learning, as well as the possibility to incorporate features in the prediction, without having to sacrifice simplicity, parsimony of the Bradley-Terry models, or computational efficiency of \'{E}l\H{o}'s original approach. We validate the structured log-odds modelling approach in synthetic experiments and English Premier League outcomes, where the added expressivity yields the best predictions reported in the state-of-art, close to the quality of contemporary betting odds.


The Causal Frame Problem: An Algorithmic Perspective

arXiv.org Machine Learning

The Frame Problem (FP) is a puzzle in philosophy of mind and epistemology, articulated by the Stanford Encyclopedia of Philosophy as follows: "How do we account for our apparent ability to make decisions on the basis only of what is relevant to an ongoing situation without having explicitly to consider all that is not relevant?" In this work, we focus on the causal variant of the FP, the Causal Frame Problem (CFP). Assuming that a reasoner's mental causal model can be (implicitly) represented by a causal Bayes net, we first introduce a notion called Potential Level (PL). PL, in essence, encodes the relative position of a node with respect to its neighbors in a causal Bayes net. Drawing on the psychological literature on causal judgment, we substantiate the claim that PL may bear on how time is encoded in the mind. Using PL, we propose an inference framework, called the PL-based Inference Framework (PLIF), which permits a boundedly-rational approach to the CFP to be formally articulated at Marr's algorithmic level of analysis. We show that our proposed framework, PLIF, is consistent with a wide range of findings in causal judgment literature, and that PL and PLIF make a number of predictions, some of which are already supported by existing findings.


A Model-based Projection Technique for Segmenting Customers

arXiv.org Machine Learning

We consider the problem of segmenting a large population of customers into non-overlapping groups with similar preferences, using diverse preference observations such as purchases, ratings, clicks, etc. over subsets of items. We focus on the setting where the universe of items is large (ranging from thousands to millions) and unstructured (lacking well-defined attributes) and each customer provides observations for only a few items. These data characteristics limit the applicability of existing techniques in marketing and machine learning. To overcome these limitations, we propose a model-based projection technique, which transforms the diverse set of observations into a more comparable scale and deals with missing data by projecting the transformed data onto a low-dimensional space. We then cluster the projected data to obtain the customer segments. Theoretically, we derive precise necessary and sufficient conditions that guarantee asymptotic recovery of the true customer segments. Empirically, we demonstrate the speed and performance of our method in two real-world case studies: (a) 84% improvement in the accuracy of new movie recommendations on the MovieLens data set and (b) 6% improvement in the performance of similar item recommendations algorithm on an offline dataset at eBay. We show that our method outperforms standard latent-class and demographic-based techniques.