Bayesian Learning
Proper Loss Functions for Nonlinear Hawkes Processes
Menon, Aditya Krishna (Data61) | Lee, Young (Australian National University)
Temporal point processes are a statistical framework for modelling the times at which events of interest occur. The Hawkes process is a well-studied instance of this framework that captures self-exciting behaviour, wherein the occurrence of one event increases the likelihood of future events. Such processes have been successfully applied to model phenomena ranging from earthquakes to behaviour in a social network. We propose a framework to design new loss functions to train linear and nonlinear Hawkes processes. This captures standard maximum likelihood as a special case, but allows for other losses that guarantee convex objective functions (for certain types of kernel), and admit simpler optimisation. We illustrate these points with three concrete examples: for linear Hawkes processes, we provide a least-squares style loss potentially admitting closed-form optimisation; for exponential Hawkes processes, we reduce training to a weighted logistic regression; and for sigmoidal Hawkes processes, we propose an asymmetric form of logistic regression.
Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models
Grover, Aditya (Stanford University) | Dhar, Manik (Stanford University) | Ermon, Stefano (Stanford University)
Adversarial learning of probabilistic models has recently emerged as a promising alternative to maximum likelihood. Implicit models such as generative adversarial networks (GAN) often generate better samples compared to explicit models trained by maximum likelihood. Yet, GANs sidestep the characterization of an explicit density which makes quantitative evaluations challenging. To bridge this gap, we propose Flow-GANs, a generative adversarial network for which we can perform exact likelihood evaluation, thus supporting both adversarial and maximum likelihood training. When trained adversarially, Flow-GANs generate high-quality samples but attain extremely poor log-likelihood scores, inferior even to a mixture model memorizing the training data; the opposite is true when trained by maximum likelihood. Results on MNIST and CIFAR-10 demonstrate that hybrid training can attain high held-out likelihoods while retaining visual fidelity in the generated samples.
Constructive Preference Elicitation Over Hybrid Combinatorial Spaces
Dragone, Paolo (University of Trento) | Teso, Stefano (KU Leuven) | Passerini, Andrea (University of Trento)
Peference elicitation is the task of suggesting a highly preferred configuration to a decision maker. The preferences are typically learned by querying the user for choice feedback over pairs or sets of objects. In its constructive variant, new objects are synthesized "from scratch" by maximizing an estimate of the user utility over a combinatorial (possibly infinite) space of candidates. In the constructive setting, most existing elicitation techniques fail because they rely on exhaustive enumeration of the candidates. A previous solution explicitly designed for constructive tasks comes with no formal performance guarantees, and can be very expensive in (or unapplicable to) problems with non-Boolean attributes. We propose the Choice Perceptron, a Perceptron-like algorithm for learning user preferences from set-wise choice feedback over constructive domains and hybrid Boolean-numeric feature spaces. We provide a theoretical analysis on the attained regret that holds for a large class of query selection strategies, and devise a heuristic strategy that aims at optimizing the regret in practice. Finally, we demonstrate its effectiveness by empirical evaluation against existing competitors on constructive scenarios of increasing complexity.
Automatic Parameter Tying: A New Approach for Regularized Parameter Learning in Markov Networks
Chou, Li (The University of Texas at Dallas) | Sahoo, Pracheta (The University of Texas at Dallas) | Sarkhel, Somdeb (Adobe Research) | Ruozzi, Nicholas (The University of Texas at Dallas) | Gogate, Vibhav (The University of Texas at Dallas)
Parameter tying is a regularization method in which parameters (weights) of a machine learning model are partitioned into groups by leveraging prior knowledge and all parameters in each group are constrained to take the same value. In this paper, we consider the problem of parameter learning in Markov networks and propose a novel approach called automatic parameter tying (APT) that uses automatic instead of a priori and soft instead of hard parameter tying as a regularization method to alleviate overfitting. The key idea behind APT is to set up the learning problem as the task of finding parameters and groupings of parameters such that the likelihood plus a regularization term is maximized. The regularization term penalizes models where parameter values deviate from their group mean parameter value. We propose and use a block coordinate ascent algorithm to solve the optimization task. We analyze the sample complexity of our new learning algorithm and show that it yields optimal parameters with high probability when the groups are well separated. Experimentally, we show that our method improves upon L 2 regularization and suggest several pragmatic techniques for good practical performance.
SFCN-OPI: Detection and Fine-Grained Classification of Nuclei Using Sibling FCN With Objectness Prior Interaction
Zhou, Yanning (The Chinese University of Hong Kong) | Dou, Qi (The Chinese University of Hong Kong) | Chen, Hao ( The Chinese University of Hong Kong ) | Qin, Jing (The Hong Kong Polytechnic University) | Heng, Pheng-Ann ( The Chinese University of Hong Kong )
Cell nuclei detection and fine-grained classification have been fundamental yet challenging problems in histopathology image analysis. Due to the nuclei tiny size, significant inter-/intra-class variances, as well as the inferior image quality, previous automated methods would easily suffer from limited accuracy and robustness. In the meanwhile, existing approaches usually deal with these two tasks independently, which would neglect the close relatedness of them. In this paper, we present a novel method of sibling fully convolutional network with prior objectness interaction (called SFCN-OPI) to tackle the two tasks simultaneously and interactively using a unified end-to-end framework. Specifically, the sibling FCN branches share features in earlier layers while holding respective higher layers for specific tasks. More importantly, the detection branch outputs the objectness prior which dynamically interacts with the fine-grained classification sibling branch during the training and testing processes. With this mechanism, the fine-grained classification successfully focuses on regions with high confidence of nuclei existence and outputs the conditional probability, which in turn benefits the detection through back propagation. Extensive experiments on colon cancer histology images have validated the effectiveness of our proposed SFCN-OPI and our method has outperformed the state-of-the-art methods by a large margin.
Multimodal Poisson Gamma Belief Network
Wang, Chaojie (Xidian University) | Chen, Bo (Xidian University) | Zhou, Mingyuan ( The University of Texas at Austin )
To learn a deep generative model of multimodal data, we propose a multimodal Poisson gamma belief network (mPGBN) that tightly couple the data of different modalities at multiple hidden layers. The mPGBN unsupervisedly extracts a nonnegative latent representation using an upward-downward Gibbs sampler. It imposes sparse connections between different layers, making it simple to visualize the generative process and the relationships between the latent features of different modalities. Our experimental results on bi-modal data consisting of images and tags show that the mPGBN can easily impute a missing modality and hence is useful for both image annotation and retrieval. We further demonstrate that the mPGBN achieves state-of-the-art results on unsupervisedly extracting latent features from multimodal data.
Predicting Vehicular Travel Times by Modeling Heterogeneous Influences Between Arterial Roads
Achar, Avinash (Tata Consultancy Services) | Sarangan, Venkatesh (Tata Consultancy Services) | Regikumar, Rohith (Tata Consultancy Services) | Sivasubramaniam, Anand (Pennsylvania State University)
Predicting travel times of vehicles in urban settings is a useful and tangible quantity of interest in the context of intelligent transportation systems. We address the problem of travel time prediction in arterial roads using data sampled from probe vehicles. There is only a limited literature on methods using data input from probe vehicles. The spatio-temporal dependencies captured by existing data driven approaches are either too detailed or very simplistic. We strike a balance of the existing data driven approaches to account for varying degrees of influence a given road may experience from its neighbors, while controlling the number of parameters to be learnt. Specifically, we use a NoisyOR conditional probability distribution (CPD) in conjunction with a dynamic Bayesian network (DBN) to model state transitions of various roads. We propose an efficient algorithm to learn model parameters. We also propose an algorithm for predicting travel times on trips of arbitrary durations. Using synthetic and real world data traces we demonstrate the superior performance of the proposed method under different traffic conditions.
Fair Inference on Outcomes
Nabi, Razieh (Johns Hopkins University) | Shpitser, Ilya (Johns Hopkins University)
In this paper, we consider the problem of fair statistical inference involving outcome variables. Examples include classification and regression problems, and estimating treatment effects in randomized trials or observational data. The issue of fairness arises in such problems where some covariates or treatments are "sensitive," in the sense of having potential of creating discrimination. In this paper, we argue that the presence of discrimination can be formalized in a sensible way as the presence of an effect of a sensitive covariate on the outcome along certain causal pathways, a view which generalizes (Pearl 2009). A fair outcome model can then be learned by solving a constrained optimization problem. We discuss a number of complications that arise in classical statistical inference due to this view and provide workarounds based on recent work in causal and semi-parametric inference.
Sentiment Analysis via Deep Hybrid Textual-Crowd Learning Model
Dizaji, Kamran Ghasedi (University of Pittsburgh) | Huang, Heng (University of Pittsburgh)
Crowdsourcing technique provides an efficient platform to employ human skills in sentiment analysis, which is a difficult task for automatic language models due to the large variations in context, writing style, view point and so on. However, the standard crowdsourcing aggregation models are incompetent when the number of crowd labels per worker is not sufficient to train parameters, or when it is not feasible to collect labels for each sample in a large dataset. In this paper, we propose a novel hybrid model to exploit both crowd and text data for sentiment analysis, consisting of a generative crowdsourcing aggregation model and a deep sentimental autoencoder. Combination of these two sub-models is obtained based on a probabilistic framework rather than a heuristic way. We introduce a unified objective function to incorporate the objectives of both sub-models, and derive an efficient optimization algorithm to jointly solve the corresponding problem. Experimental results indicate that our model achieves superior results in comparison with the state-of-the-art models, especially when the crowd labels are scarce.
An Interpretable Joint Graphical Model for Fact-Checking From Crowds
Nguyen, An T. (University of Texas at Austin) | Kharosekar, Aditya (University of Texas at Austin) | Lease, Matthew (University of Texas at Austin) | Wallace, Byron (Northeastern University)
Assessing the veracity of claims made on the Internet is an important, challenging, and timely problem. While automated fact-checking models have potential to help people better assess what they read, we argue such models must be explainable, accurate, and fast to be useful in practice; while prediction accuracy is clearly important, model transparency is critical in order for users to trust the system and integrate their own knowledge with model predictions. To achieve this, we propose a novel probabilistic graphical model (PGM) which combines machine learning with crowd annotations. Nodes in our model correspond to claim veracity, article stance regarding claims, reputation of news sources, and annotator reliabilities. We introduce a fast variational method for parameter estimation. Evaluation across two real-world datasets and three scenarios shows that: (1) joint modeling of sources, claims and crowd annotators in a PGM improves the predictive performance and interpretability for predicting claim veracity; and (2) our variational inference method achieves scalably fast parameter estimation, with only modest degradation in performance compared to Gibbs sampling. Regarding model transparency, we designed and deployed a prototype fact-checker Web tool, including a visual interface for explaining model predictions. Results of a small user study indicate that model explanations improve user satisfaction and trust in model predictions. We share our web demo, model source code, and the 13K crowd labels we collected.