Goto

Collaborating Authors

 probabilistic data analysis


A Probabilistic Programming Approach To Probabilistic Data Analysis

Neural Information Processing Systems

Probabilistic techniques are central to data analysis, but different approaches can be challenging to apply, combine, and compare. This paper introduces composable generative population models (CGPMs), a computational abstraction that extends directed graphical models and can be used to describe and compose a broad class of probabilistic data analysis techniques. Examples include discriminative machine learning, hierarchical Bayesian models, multivariate kernel methods, clustering algorithms, and arbitrary probabilistic programs. We demonstrate the integration of CGPMs into BayesDB, a probabilistic programming platform that can express data analysis tasks using a modeling definition language and structured query language. The practical value is illustrated in two ways.


Reviews: A Probabilistic Programming Approach To Probabilistic Data Analysis

Neural Information Processing Systems

This paper takes the default BayesDB example of satellite orbits and shows how to find errors in the observed data given expected behaviour. To achieve this, ths authors construct a new type of generative population model and implement this model as part of the BayesDB/VentureScript environment. Overall I like that this pushes for more complex data analysis tasks in a general probabilistic programming environment. The paper, however, is not an easy read and it is unclear whether the proposed extension are really that general and not tuned towards the orbital example. The authors expect deep knowledge about a number of systems (BayesDB, VentureScript, Crosscat) without clearly showing the difference .