Uncertainty
Compressibility and Generalization in Large-Scale Deep Learning
Zhou, Wenda, Veitch, Victor, Austern, Morgane, Adams, Ryan P., Orbanz, Peter
Modern neural networks are highly overparameterized, with capacity to substantially overfit to training data. Nevertheless, these networks often generalize well in practice. It has also been observed that trained networks can often be "compressed" to much smaller representations. The purpose of this paper is to connect these two empirical observations. Our main technical result is a generalization bound for compressed networks based on the compressed size. Combined with off-the-shelf compression algorithms, the bound leads to state of the art generalization guarantees; in particular, we provide the first non-vacuous generalization guarantees for realistic architectures applied to the ImageNet classification problem. As additional evidence connecting compression and generalization, we show that compressibility of models that tend to overfit is limited: We establish an absolute limit on expected compressibility as a function of expected generalization error, where the expectations are over the random choice of training examples. The bounds are complemented by empirical results that show an increase in overfitting implies an increase in the number of bits required to describe a trained network.
Improving Temporal Relation Extraction with a Globally Acquired Statistical Resource
Ning, Qiang, Wu, Hao, Peng, Haoruo, Roth, Dan
Extracting temporal relations (before, after, overlapping, etc.) is a key aspect of understanding events described in natural language. We argue that this task would gain from the availability of a resource that provides prior knowledge in the form of the temporal order that events usually follow. This paper develops such a resource -- a probabilistic knowledge base acquired in the news domain -- by extracting temporal relations between events from the New York Times (NYT) articles over a 20-year span (1987--2007). We show that existing temporal extraction systems can be improved via this resource. As a byproduct, we also show that interesting statistics can be retrieved from this resource, which can potentially benefit other time-aware tasks. The proposed system and resource are both publicly available.
Deep Bayesian Trust : A Dominant Strategy and Fair Reward Mechanism for Crowdsourcing
A common mechanism to assess trust in crowdworkers is to have them answer gold tasks. However, assigning gold tasks to all workers reduces the efficiency of the platform. We propose a mechanism that exploits transitivity so that a worker can be certified as trusted by other trusted workers who solve common tasks. Thus, trust can be derived from a smaller number of gold tasks assignment through multiple layers of peer relationship among the workers, a model we call deep trust. We use the derived trust to incentivize workers for high quality work and show that the resulting mechanism is dominant strategy incentive compatible. We also show that the mechanism satisfies a notion of fairness in that the trust assessment (and thus the reward) of a worker in the limit is independent of the quality of other workers.
Universal Model-free Information Extraction
Li, Bin, Lan, Yueheng, Guo, Weisi, Zhao, Chenglin
Bayesian approaches have been used extensively in scientific and engineering research to quantify uncertainty and extract information. However, its model-dependent nature means that when the a priori model is incomplete or unavailable, there is a severe risk that Bayesian approaches will yield misleading results. Here, we propose a universal model-free information extraction approach, capable of reliably recovering target signals from complex responses. This breakthrough leverages on a data-centric approach, whereby measured data is reconfigured to create an enriched observable space, which in turn is mapped to a well-adapted manifold, thereby detecting crucial information via a reconstructed low-rank phase-space. A Koopman operator is used to transform hidden and complex nonlinear dynamics to linear one, which enables us to detect hidden event of interest from rapidly evolving systems, and relate it to either unobservable stimulus or anomalous behaviour. Thanks to its data-driven nature, our method excludes completely any prior knowledge on governing dynamics. We benchmark the astonishing accuracy of our method on three diverse and challenging problems in: biology, medicine, and engineering. In all cases, our approach outperforms existing state-of-the-art methods, of both Bayesian and non-Bayesian type. By creating a new reliable information analysis paradigm, it is suitable for ubiquitous nonlinear dynamical systems or end-users with little expertise, which permits the unbiased understanding of various mechanisms in the real world.
Causal Data Science for Financial Stress Testing
Gao, Gelin, Mishra, Bud, Ramazzotti, Daniele
The most recent financial upheavals have cast doubt on the adequacy of some of the conventional quantitative risk management strategies, such as VaR (Value at Risk), in many common situations. Consequently, there has been an increasing need for verisimilar financial stress testings, namely simulating and analyzing financial portfolios in extreme, albeit rare scenarios. Unlike conventional risk management which exploits statistical correlations among financial instruments, here we focus our analysis on the notion of probabilistic causation, which is embodied by Suppes-Bayes Causal Networks (SBCNs); SBCNs are probabilistic graphical models that have many attractive features in terms of more accurate causal analysis for generating financial stress scenarios. In this paper, we present a novel approach for conducting stress testing of financial portfolios based on SBCNs in combination with classical machine learning classification tools. The resulting method is shown to be capable of correctly discovering the causal relationships among financial factors that affect the portfolios and thus, simulating stress testing scenarios with a higher accuracy and lower computational complexity than conventional Monte Carlo Simulations.
Structural Learning of Probabilistic Graphical Models of Cumulative Phenomena
Ramazzotti, Daniele, Nobile, Marco S., Antoniotti, Marco, Graudenzi, Alex
One of the critical issues when adopting Bayesian networks (BNs) to model dependencies among random variables is to "learn" their structure. This is a well-known NP-hard problem in its most general and classical formulation, which is furthermore complicated by known pitfalls such as the issue of I-equivalence among different structures. In this work we restrict the investigation to a specific class of networks, i.e., those representing the dynamics of phenomena characterized by the monotonic accumulation of events. Such phenomena allow to set specific structural constraints based on Suppes' theory of probabilistic causation and, accordingly, to define constrained BNs, named Suppes-Bayes Causal Networks (SBCNs). Within this framework, we study the structure learning of SBCNs via extensive simulations with various state-of-the-art search strategies, such as canonical local search techniques and Genetic Algorithms. This investigation is intended to be an extension and an in-depth clarification of our previous works on SBCN structure learning. Among the main results, we show that Suppes' constraints do simplify the learning task, by reducing the solution search space and providing a temporal ordering on the variables, which simplifies the complications derived by I-equivalent structures. Finally, we report on tradeoffs among different optimization techniques that can be used to learn SBCNs.
Nonparametric Bayesian label prediction on a large graph using truncated Laplacian regularization
Hartog, Jarno, van Zanten, Harry
This article describes an implementation of a nonparametric Bayesian approach to solving binary classification problems on graphs. We consider a hierarchical Bayesian approach with a prior that is constructed by truncating a series expansion of the soft label function using the graph Laplacian eigenfunctions as basis functions. We compare our truncated prior to the untruncated Laplacian based prior in simulated and real data examples to illustrate the improved scalability in terms of size of the underlying graph.
A Latent Gaussian Mixture Model for Clustering Longitudinal Data
Bierling, Vanessa S. E., McNicholas, Paul D.
Finite mixture models have become a popular tool for clustering. Amongst other uses, they have been applied for clustering longitudinal data and clustering high-dimensional data. In the latter case, a latent Gaussian mixture model is sometimes used. Although there has been much work on clustering using latent variables and on clustering longitudinal data, respectively, there has been a paucity of work that combines these features. An approach is developed for clustering longitudinal data with many time points based on an extension of the mixture of common factor analyzers model. A variation of the expectation-maximization algorithm is used for parameter estimation and the Bayesian information criterion is used for model selection. The approach is illustrated using real and simulated data.
Generative models for local network community detection
Local network community detection aims to find a single community in a large network, while inspecting only a small part of that network around a given seed node. This is much cheaper than finding all communities in a network. Most methods for local community detection are formulated as ad-hoc optimization problems. In this work, we instead start from a generative model for networks with community structure. By assuming that the network is uniform, we can approximate the structure of unobserved parts of the network to obtain a method for local community detection. We apply this local approximation technique to two variants of the stochastic block model. To our knowledge, this results in the first local community detection methods based on probabilistic models. Interestingly, in the limit, one of the proposed approximations corresponds to conductance, a popular metric in this field. Experiments on real and synthetic datasets show comparable or improved results compared to state-of-the-art local community detection algorithms.
Causal Inference via Kernel Deviance Measures
Mitrovic, Jovana, Sejdinovic, Dino, Teh, Yee Whye
Discovering the causal structure among a set of variables is a fundamental problem in many areas of science. In this paper, we propose Kernel Conditional Deviance for Causal Inference (KCDC) a fully nonparametric causal discovery method based on purely observational data. From a novel interpretation of the notion of asymmetry between cause and effect, we derive a corresponding asymmetry measure using the framework of reproducing kernel Hilbert spaces. Based on this, we propose three decision rules for causal discovery. We demonstrate the wide applicability of our method across a range of diverse synthetic datasets. Furthermore, we test our method on real-world time series data and the real-world benchmark dataset Tubingen Cause-Effect Pairs where we outperform existing state-of-the-art methods.