Uncertainty
Learning Probabilistic Programs
Perov, Yura N., Wood, Frank D.
We develop a technique for generalising from data in which models are samplers represented as program text. We establish encouraging empirical results that suggest that Markov chain Monte Carlo probabilistic programming inference techniques coupled with higher-order probabilistic programming languages are now sufficiently powerful to enable successful inference of this kind in nontrivial domains. We also introduce a new notion of probabilistic program compilation and show how the same machinery might be used in the future to compile probabilistic programs for efficient reusable predictive inference.
Inferring latent structures via information inequalities
Chaves, R., Luft, L., Maciel, T. O., Gross, D., Janzing, D., Schรถlkopf, B.
One of the goals of probabilistic inference is to decide whether an empirically observed distribution is compatible with a candidate Bayesian network. However, Bayesian networks with hidden variables give rise to highly non-trivial constraints on the observed distribution. Here, we propose an information-theoretic approach, based on the insight that conditions on entropies of Bayesian networks take the form of simple linear inequalities. We describe an algorithm for deriving entropic tests for latent structures. The well-known conditional independence tests appear as a special case. While the approach applies for generic Bayesian networks, we presently adopt the causal view, and show the versatility of the framework by treating several relevant problems from that domain: detecting common ancestors, quantifying the strength of causal influence, and inferring the direction of causation from two-variable marginals.
DimmWitted: A Study of Main-Memory Statistical Analytics
We perform the first study of the tradeoff space of access methods and replication to support statistical analytics using first-order methods executed in the main memory of a Non-Uniform Memory Access (NUMA) machine. Statistical analytics systems differ from conventional SQL-analytics in the amount and types of memory incoherence they can tolerate. Our goal is to understand tradeoffs in accessing the data in row- or column-order and at what granularity one should share the model and data for a statistical task. We study this new tradeoff space, and discover there are tradeoffs between hardware and statistical efficiency. We argue that our tradeoff study may provide valuable information for designers of analytics engines: for each system we consider, our prototype engine can run at least one popular task at least 100x faster. We conduct our study across five architectures using popular models including SVMs, logistic regression, Gibbs sampling, and neural networks.
Inverse Graphics with Probabilistic CAD Models
Kulkarni, Tejas D., Mansinghka, Vikash K., Kohli, Pushmeet, Tenenbaum, Joshua B.
Recently, multiple formulations of vision problems as probabilistic inversions of generative models based on computer graphics have been proposed. However, applications to 3D perception from natural images have focused on low-dimensional latent scenes, due to challenges in both modeling and inference. Accounting for the enormous variability in 3D object shape and 2D appearance via realistic generative models seems intractable, as does inverting even simple versions of the many-to-many computations that link 3D scenes to 2D images. This paper proposes and evaluates an approach that addresses key aspects of both these challenges. We show that it is possible to solve challenging, real-world 3D vision problems by approximate inference in generative models for images based on rendering the outputs of probabilistic CAD (PCAD) programs. Our PCAD object geometry priors generate deformable 3D meshes corresponding to plausible objects and apply affine transformations to place them in a scene. Image likelihoods are based on similarity in a feature space based on standard mid-level image representations from the vision literature. Our inference algorithm integrates single-site and locally blocked Metropolis-Hastings proposals, Hamiltonian Monte Carlo and discriminative data-driven proposals learned from training data generated from our models. We apply this approach to 3D human pose estimation and object shape reconstruction from single images, achieving quantitative and qualitative performance improvements over state-of-the-art baselines.
Reconstructing Velocities of Migrating Birds from Weather Radar โ A Case Study in Computational Sustainability
Farnsworth, Andrew (Cornell University) | Sheldon, Daniel (University of Massachusetts Amherst) | Geevarghese, Jeffrey (University of Massachusetts Amherst) | Irvine, Jed (Oregon State University) | Doren, Benjamin Van (Cornell University) | Webb, Kevin (Cornell University) | Dietterich, Thomas G. (Oregon State University) | Kelling, Steve (Cornell University)
Each volume scan consists radial velocity data. For any given pulse volume, radial of a sequence of sweeps during which the antenna velocity tells us the component of target velocity in rotates 360 degrees around a vertical axis while the direction of the radar beam, and we have no additional keeping its elevation angle fixed (figure 2). The result information about the component orthogonal of each sweep is a set of raster data products summarizing to the radar beam. However, the overall pattern of the the radar signal returned from targets within sweep often provides clear evidence about the true discrete pulse volumes, which are the portions of the target velocities. In this example, targets to the northeast atmosphere sensed at a particular antenna position (NE) of the radar station have negative radial and range from the radar. The coordinates of each velocities (dark colors), which means they are pulse volume (r, ฯ, ฯ) are measured in a three-dimensional approaching the radar, and targets to the southwest polar coordinate system: r is the distance in (SW) of the radar station have positive radial velocities meters from the antenna, ฯ is the azimuth, which is (light colors), which means they are departing the angle in the horizontal plane between the antenna direction and a fixed reference direction (typically the radar station. We can infer that the targets (in this degrees clockwise from due north), and ฯ is the elevation case, predominantly migrating birds) are moving uniformly angle, which is the angle between the antenna in a SW direction, as shown in panel (c). The direction and its projection onto the horizontal spiral pattern in the velocity image is due to changes plane.
Nonparametric Hierarchical Clustering of Functional Data
Boullรฉ, Marc, Guigourรจs, Romain, Rossi, Fabrice
In this paper, we deal with the problem of curves clustering. We propose a nonparametric method which partitions the curves into clusters and discretizes the dimensions of the curve points into intervals. The cross-product of these partitions forms a data-grid which is obtained using a Bayesian model selection approach while making no assumptions regarding the curves. Finally, a post-processing technique, aiming at reducing the number of clusters in order to improve the interpretability of the clustering, is proposed. It consists in optimally merging the clusters step by step, which corresponds to an agglomerative hierarchical classification whose dissimilarity measure is the variation of the criterion. Interestingly this measure is none other than the sum of the Kullback-Leibler divergences between clusters distributions before and after the merges. The practical interest of the approach for functional data exploratory analysis is presented and compared with an alternative approach on an artificial and a real world data set.
Rough Set Semantics for Identity on the Web
Beek, Wouter (VU University Amsterdam) | Schlobach, Stefan (VU University Amsterdam) | Harmelen, Frank van (VU University Amsterdam)
Identity relations are at the foundation of many logic-based knowledge representations. We argue that the traditional notion of equality, is unsuited for many realistic knowledge representation settings. The classical interpretation of equality is too strong when the equality statements are re-used outside their original context. On the Semantic Web, equality statements are used to interlink multiple descriptions of the same object, using owl:sameAs assertions. And indeed, many practical uses of owl:sameAs are known to violate the formal Leibniz-style semantics. We provide a more flexible semantics to identity by assigning meaning to the subrelations of an identity relation in terms of the predicates that are used in a knowledge-base. Using those indiscernability-predicates, we define upper and lower approximations of equality in the style of rought-set theory, resulting in a quality-measure for identity relations.
Reasoning with Uncertain Inputs in Possibilistic Networks
Benferhat, Salem (Artois University) | Tabia, Karim (Artois University)
Graphical belief models are compact and powerful tools for representing and reasoning under uncertainty. Possibilistic networks are graphical belief models based on possibility theory. In this paper, we address reasoning under uncertain inputs in both quantitative and qualitative possibilistic networks. More precisely, we first provide possibilistic counterparts of Pearl's methods of virtual evidence then compare them with the possibilistic counterparts of Jeffrey's rule of conditioning. As in the probabilistic setting, the two methods are shown to be equivalent in the quantitative setting regarding the existence and uniqueness of the solution. However in the qualitative setting, Pearl's method of virtual evidence which applies directly on graphical models disagrees with Jeffrey's rule and the virtual evidence method. The paper provides the precise situations where the methods are not equivalent. Finally, the paper addresses related issues like transformations from one method to another and commutativity.
Theoretical Analysis of Bayesian Optimisation with Unknown Gaussian Process Hyper-Parameters
Bayesian optimisation has gained great popularity as a tool for optimising the parameters of machine learning algorithms and models. Somewhat ironically, setting up the hyper-parameters of Bayesian optimisation methods is notoriously hard. While reasonable practical solutions have been advanced, they can often fail to find the best optima. Surprisingly, there is little theoretical analysis of this crucial problem in the literature. To address this, we derive a cumulative regret bound for Bayesian optimisation with Gaussian processes and unknown kernel hyper-parameters in the stochastic setting. The bound, which applies to the expected improvement acquisition function and sub-Gaussian observation noise, provides us with guidelines on how to design hyper-parameter estimation methods. A simple simulation demonstrates the importance of following these guidelines.
Direct Density-Derivative Estimation and Its Application in KL-Divergence Approximation
Sasaki, Hiroaki, Noh, Yung-Kyun, Sugiyama, Masashi
Estimation of density derivatives is a versatile tool in statistical data analysis. A naive approach is to first estimate the density and then compute its derivative. However, such a two-step approach does not work well because a good density estimator does not necessarily mean a good density-derivative estimator. In this paper, we give a direct method to approximate the density derivative without estimating the density itself. Our proposed estimator allows analytic and computationally efficient approximation of multi-dimensional high-order density derivatives, with the ability that all hyper-parameters can be chosen objectively by cross-validation. We further show that the proposed density-derivative estimator is useful in improving the accuracy of non-parametric KL-divergence estimation via metric learning. The practical superiority of the proposed method is experimentally demonstrated in change detection and feature selection.