Directed Networks
Learning Probabilistic Programs
Perov, Yura N., Wood, Frank D.
We develop a technique for generalising from data in which models are samplers represented as program text. We establish encouraging empirical results that suggest that Markov chain Monte Carlo probabilistic programming inference techniques coupled with higher-order probabilistic programming languages are now sufficiently powerful to enable successful inference of this kind in nontrivial domains. We also introduce a new notion of probabilistic program compilation and show how the same machinery might be used in the future to compile probabilistic programs for efficient reusable predictive inference.
Inferring latent structures via information inequalities
Chaves, R., Luft, L., Maciel, T. O., Gross, D., Janzing, D., Schรถlkopf, B.
One of the goals of probabilistic inference is to decide whether an empirically observed distribution is compatible with a candidate Bayesian network. However, Bayesian networks with hidden variables give rise to highly non-trivial constraints on the observed distribution. Here, we propose an information-theoretic approach, based on the insight that conditions on entropies of Bayesian networks take the form of simple linear inequalities. We describe an algorithm for deriving entropic tests for latent structures. The well-known conditional independence tests appear as a special case. While the approach applies for generic Bayesian networks, we presently adopt the causal view, and show the versatility of the framework by treating several relevant problems from that domain: detecting common ancestors, quantifying the strength of causal influence, and inferring the direction of causation from two-variable marginals.
DimmWitted: A Study of Main-Memory Statistical Analytics
We perform the first study of the tradeoff space of access methods and replication to support statistical analytics using first-order methods executed in the main memory of a Non-Uniform Memory Access (NUMA) machine. Statistical analytics systems differ from conventional SQL-analytics in the amount and types of memory incoherence they can tolerate. Our goal is to understand tradeoffs in accessing the data in row- or column-order and at what granularity one should share the model and data for a statistical task. We study this new tradeoff space, and discover there are tradeoffs between hardware and statistical efficiency. We argue that our tradeoff study may provide valuable information for designers of analytics engines: for each system we consider, our prototype engine can run at least one popular task at least 100x faster. We conduct our study across five architectures using popular models including SVMs, logistic regression, Gibbs sampling, and neural networks.
Reconstructing Velocities of Migrating Birds from Weather Radar โ A Case Study in Computational Sustainability
Farnsworth, Andrew (Cornell University) | Sheldon, Daniel (University of Massachusetts Amherst) | Geevarghese, Jeffrey (University of Massachusetts Amherst) | Irvine, Jed (Oregon State University) | Doren, Benjamin Van (Cornell University) | Webb, Kevin (Cornell University) | Dietterich, Thomas G. (Oregon State University) | Kelling, Steve (Cornell University)
Each volume scan consists radial velocity data. For any given pulse volume, radial of a sequence of sweeps during which the antenna velocity tells us the component of target velocity in rotates 360 degrees around a vertical axis while the direction of the radar beam, and we have no additional keeping its elevation angle fixed (figure 2). The result information about the component orthogonal of each sweep is a set of raster data products summarizing to the radar beam. However, the overall pattern of the the radar signal returned from targets within sweep often provides clear evidence about the true discrete pulse volumes, which are the portions of the target velocities. In this example, targets to the northeast atmosphere sensed at a particular antenna position (NE) of the radar station have negative radial and range from the radar. The coordinates of each velocities (dark colors), which means they are pulse volume (r, ฯ, ฯ) are measured in a three-dimensional approaching the radar, and targets to the southwest polar coordinate system: r is the distance in (SW) of the radar station have positive radial velocities meters from the antenna, ฯ is the azimuth, which is (light colors), which means they are departing the angle in the horizontal plane between the antenna direction and a fixed reference direction (typically the radar station. We can infer that the targets (in this degrees clockwise from due north), and ฯ is the elevation case, predominantly migrating birds) are moving uniformly angle, which is the angle between the antenna in a SW direction, as shown in panel (c). The direction and its projection onto the horizontal spiral pattern in the velocity image is due to changes plane.
Sequential Decision Making in Computational Sustainability via Adaptive Submodularity
Krause, Andreas (ETH Zurich) | Golovin, Daniel (Google) | Converse, Sarah (USGS Patuxent Wildlife Research Center)
Many problems in computational sustainability require making a sequence of decisions in complex, uncertain environments. Such problems are generally notoriously difficult. In this article, we review the recently discovered notion of adaptive submodularity, an intuitive diminishing returns condition that generalizes the classical notion of submodular set functions to sequential decision problems. Problems exhibiting the adaptive submodularity property can be efficiently and provably near-optimally solved using simple myopic policies. We illustrate this concept in several case studies of interest in computational sustainability: First, we demonstrate how it can be used to efficiently plan for resolving uncertainty in adaptive management scenarios. Secondly, we show how it applies to dynamic conservation planning for protecting endangered species, a case study carried out in collaboration with the US Geological Survey and the US Fish and Wildlife Service.
Nonparametric Hierarchical Clustering of Functional Data
Boullรฉ, Marc, Guigourรจs, Romain, Rossi, Fabrice
In this paper, we deal with the problem of curves clustering. We propose a nonparametric method which partitions the curves into clusters and discretizes the dimensions of the curve points into intervals. The cross-product of these partitions forms a data-grid which is obtained using a Bayesian model selection approach while making no assumptions regarding the curves. Finally, a post-processing technique, aiming at reducing the number of clusters in order to improve the interpretability of the clustering, is proposed. It consists in optimally merging the clusters step by step, which corresponds to an agglomerative hierarchical classification whose dissimilarity measure is the variation of the criterion. Interestingly this measure is none other than the sum of the Kullback-Leibler divergences between clusters distributions before and after the merges. The practical interest of the approach for functional data exploratory analysis is presented and compared with an alternative approach on an artificial and a real world data set.
Relational Logistic Regression
Kazemi, Seyed Mehran (University of British Columbia) | Buchman, David (University of British Columbia) | Kersting, Kristian (Technical University of Dortmund) | Natarajan, Sriraam (Indiana University) | Poole, David (University of British Columbia)
Logistic regression is a commonly used representation for aggregators in Bayesian belief networks when a child has multiple parents. In this paper we consider extending logistic regression to relational models, where we want to model varying populations and interactions among parents. In this paper, we first examine the representational problems caused by population variation. We show how these problems arise even in simple cases with a single parametrized parent, and propose a linear relational logistic regression which we show can represent arbitrary linear (in population size) decision thresholds, whereas the traditional logistic regression cannot. Then we examine representing interactions among the parents of a child node, and representing non-linear dependency on population size. We propose a multi-parent relational logistic regression which can represent interactions among parents and arbitrary polynomial decision thresholds. Finally, we show how other well-known aggregators can be represented using this relational logistic regression.
Mass-Univariate Hypothesis Testing on MEEG Data using Cross-Validation
Recent advances in statistical theory, together with advances in the computational power of computers, provide alternative methods to do mass-univariate hypothesis testing in which a large number of univariate tests, can be properly used to compare MEEG data at a large number of time-frequency points and scalp locations. One of the major problematic aspects of this kind of mass-univariate analysis is due to high number of accomplished hypothesis tests. Hence procedures that remove or alleviate the increased probability of false discoveries are crucial for this type of analysis. Here, I propose a new method for mass-univariate analysis of MEEG data based on cross-validation scheme. In this method, I suggest a hierarchical classification procedure under k-fold cross-validation to detect which sensors at which time-bin and which frequency-bin contributes in discriminating between two different stimuli or tasks. To achieve this goal, a new feature extraction method based on the discrete cosine transform (DCT) employed to get maximum advantage of all three data dimensions. Employing cross-validation and hierarchy architecture alongside the DCT feature space makes this method more reliable and at the same time enough sensitive to detect the narrow effects in brain activities.
Combining predictions from linear models when training and test inputs differ
Methods for combining predictions from different models in a supervised learning setting must somehow estimate/predict the quality of a model's predictions at unknown future inputs. Many of these methods (often implicitly) make the assumption that the test inputs are identical to the training inputs, which is seldom reasonable. By failing to take into account that prediction will generally be harder for test inputs that did not occur in the training set, this leads to the selection of too complex models. Based on a novel, unbiased expression for KL divergence, we propose XAIC and its special case FAIC as versions of AIC intended for prediction that use different degrees of knowledge of the test inputs. Both methods substantially differ from and may outperform all the known versions of AIC even when the training and test inputs are iid, and are especially useful for deterministic inputs and under covariate shift. Our experiments on linear models suggest that if the test and training inputs differ substantially, then XAIC and FAIC predictively outperform AIC, BIC and several other methods including Bayesian model averaging.
Divide-and-Conquer Learning by Anchoring a Conical Hull
Zhou, Tianyi, Bilmes, Jeff, Guestrin, Carlos
We reduce a broad class of machine learning problems, usually addressed by EM or sampling, to the problem of finding the $k$ extremal rays spanning the conical hull of a data point set. These $k$ "anchors" lead to a global solution and a more interpretable model that can even outperform EM and sampling on generalization error. To find the $k$ anchors, we propose a novel divide-and-conquer learning scheme "DCA" that distributes the problem to $\mathcal O(k\log k)$ same-type sub-problems on different low-D random hyperplanes, each can be solved by any solver. For the 2D sub-problem, we present a non-iterative solver that only needs to compute an array of cosine values and its max/min entries. DCA also provides a faster subroutine for other methods to check whether a point is covered in a conical hull, which improves algorithm design in multiple dimensions and brings significant speedup to learning. We apply our method to GMM, HMM, LDA, NMF and subspace clustering, then show its competitive performance and scalability over other methods on rich datasets.