Goto

Collaborating Authors

 Bayesian Inference


Improvability Through Semi-Supervised Learning: A Survey of Theoretical Results

arXiv.org Machine Learning

Semi-supervised learning is a setting in which one has labeled and unlabeled data available. In this survey we explore different types of theoretical results when one uses unlabeled data in classification and regression tasks. Most methods that use unlabeled data rely on certain assumptions about the data distribution. When those assumptions are not met in reality, including unlabeled data may actually decrease performance. Studying such methods, it therefore is particularly important to have an understanding of the underlying theory. In this review we gather results about the possible gains one can achieve when using semi-supervised learning as well as results about the limits of such methods. More precisely, this review collects the answers to the following questions: What are, in terms of improving supervised methods, the limits of semi-supervised learning? What are the assumptions of different methods? What can we achieve if the assumptions are true? Finally, we also discuss the biggest bottleneck of semi-supervised learning, namely the assumptions they make.


Using Contextual Information to Improve Blood Glucose Prediction

arXiv.org Machine Learning

Blood glucose value prediction is an important task in diabetes management. While it is reported that glucose concentration is sensitive to social context such as mood, physical activity, stress, diet, alongside the influence of diabetes pathologies, we need more research on data and methodologies to incorporate and evaluate signals about such temporal context into prediction models. Person-generated data sources, such as actively contributed surveys as well as passively mined data from social media offer opportunity to capture such context, however the self-reported nature and sparsity of such data mean that such data are noisier and less specific than physiological measures such as blood glucose values themselves. Therefore, here we propose a Gaussian Process model to both address these data challenges and combine blood glucose and latent feature representations of contextual data for a novel multi-signal blood glucose prediction task. We find this approach outperforms common methods for multi-variate data, as well as using the blood glucose values in isolation. Given a robust evaluation across two blood glucose datasets with different forms of contextual information, we conclude that multi-signal Gaussian Processes can improve blood glucose prediction by using contextual information and may provide a significant shift in blood glucose prediction research and practice.


DGSAN: Discrete Generative Self-Adversarial Network

arXiv.org Machine Learning

Although GAN-based methods have received many achievements in the last few years, they have not been such successful in generating discrete data. The most important challenge of these methods is the difficulty of passing the gradient from the discriminator to the generator when the generator outputs are discrete. Despite several attempts done to alleviate this problem, none of the existing GAN-based methods has improved the performance of text generation (using measures that evaluate both the quality and the diversity of generated samples) compared to a generative RNN that is simply trained by the maximum likelihood approach. In this paper, we propose a new framework for generating discrete data by an adversarial approach in which we do not need to pass the gradient to the generator. In the proposed method, the update of either the generator or the discriminator can be accomplished straightforwardly. Moreover, we leverage the discreteness of data to explicitly model the data distribution and ensure the normalization of the generated distribution and consequently the convergence properties of the proposed method. Experimental results generally show the superiority of the proposed DGSAN method compared to the other GAN-based approaches for generating discrete sequential data.


Scalable Modeling of Spatiotemporal Data using the Variational Autoencoder: an Application in Glaucoma

arXiv.org Machine Learning

Submitted to the Annals of Applied Statistics SCALABLE MODELING OF SPATIOTEMPORAL DATA USING THE VARIATIONAL AUTOENCODER: AN APPLICATION IN GLAUCOMA By Samuel I. Berchuck, Felipe A. Medeiros and Sayan Mukherjee Duke University As big spatial data becomes increasingly prevalent, classical spatiotemporal (ST) methods often do not scale well. While methods have been developed to account for high-dimensional spatial objects, the setting where there are exceedingly large samples of spatial observations has had less attention. The variational autoencoder (V AE), an unsupervised generative model based on deep learning and approximate Bayesian inference, fills this void using a latent variable specification that is inferred jointly across the large number of samples. In this manuscript, we compare the performance of the V AE with a more classical ST method when analyzing longitudinal visual fields from a large cohort of patients in a prospective glaucoma study. Through simulation and a case study, we demonstrate that the V AE is a scalable method for analyzing ST data, when the goal is to obtain accurate predictions. R code to implement the V AE can be found on GitHub: https://github.com/berchuck/vaeST. 1. Introduction. As high-speed computing and medical imaging become increasingly inexpensive, massive amounts of data are generated that have to be analyzed and are often spatial in nature (Bearden and Thompson, 2017; Smith and Nichols, 2018). In the case of medical imaging, the number of patients that can be imaged has skyrocketed in recent years, allowing for studies that include images from many thousands of patients (Van Essen et al., 2013; Miller et al., 2016). The current spatial statistics literature focuses heavily on scalability in terms of the number of spatial locations (Banerjee, 2017), however largely ignores the setting where a joint model is needed for spatiotemporal (ST) data that are generated from a large cohort. Historically, learning an appropriate generating process in this setting was untenable, typically leading to simplifying assumptions, such as point-wise (PW) modeling of locations across time (Fitzke et al., 1996). In particular, generative models using deep learning have shown great promise in modeling complex distributions, p( x), for x x 1: M in some potentially high-dimensional space X . Sampling from X is often intractable, so instead generative modeling learns a distribution q (x) that can be sampled from and is close to p (x) (Doersch, 2016). As such, generative modeling can be viewed as an approximate method for performing inference in high-dimensional contexts, when there is an overwhelming availability of observations x . Generative modeling, and in particular the variational auto-encoder (V AE), are well-suited for modeling large cohorts of ST data, because they can characterize variability in a spatial data source through joint modeling (Kingma and Welling, 2013).


Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning

arXiv.org Machine Learning

Many decision problems in science, engineering and economics are affected by uncertain parameters whose distribution is only indirectly observable through samples. The goal of data-driven decision-making is to learn a decision from finitely many training samples that will perform well on unseen test samples. This learning task is difficult even if all training and test samples are drawn from the same distribution---especially if the dimension of the uncertainty is large relative to the training sample size. Wasserstein distributionally robust optimization seeks data-driven decisions that perform well under the most adverse distribution within a certain Wasserstein distance from a nominal distribution constructed from the training samples. In this tutorial we will argue that this approach has many conceptual and computational benefits. Most prominently, the optimal decisions can often be computed by solving tractable convex optimization problems, and they enjoy rigorous out-of-sample and asymptotic consistency guarantees. We will also show that Wasserstein distributionally robust optimization has interesting ramifications for statistical learning and motivates new approaches for fundamental learning tasks such as classification, regression, maximum likelihood estimation or minimum mean square error estimation, among others.


Opponent Aware Reinforcement Learning

arXiv.org Machine Learning

In several reinforcement learning (RL) scenarios such as security settings, there may be adversaries trying to interfere with the reward generating process for their own benefit. We introduce Threatened Markov Decision Processes (TMDPs) as a framework to support an agent against potential opponents in a RL context. We also propose a level-k thinking scheme resulting in a novel learning approach to deal with TMDPs. After introducing our framework and deriving theoretical results, relevant empirical evidence is given via extensive experiments, showing the benefits of accounting for adversaries in RL while the agent learns


Minimum Description Length Revisited

arXiv.org Machine Learning

This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was originally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since 2007, the last time an extensive overview was written. These include new methods for model selection and averaging and hypothesis testing, as well as the first completely general definition of {\em MDL estimators}. Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC vs BIC and cross-validation vs Bayes can, to a large extent, be viewed from a unified perspective.


A Bayesian Choice Model for Eliminating Feedback Loops

arXiv.org Machine Learning

Self-reinforcing feedback loops in personalization systems are typically caused by users choosing from a limited set of alternatives presented systematically based on previous choices. We propose a Bayesian choice model built on Luce axioms that explicitly accounts for users' limited exposure to alternatives. Our model is fair---it does not impose negative bias towards unpresented alternatives, and practical---preference estimates are accurately inferred upon observing a small number of interactions. It also allows efficient sampling, leading to a straightforward online presentation mechanism based on Thompson sampling. Our approach achieves low regret in learning to present upon exploration of only a small fraction of possible presentations. The proposed structure can be reused as a building block in interactive systems, e.g., recommender systems, free of feedback loops.


Hierarchical Bayesian Personalized Recommendation: A Case Study and Beyond

arXiv.org Machine Learning

Items in modern recommender systems are often organized in hierarchical structures. These hierarchical structures and the data within them provide valuable information for building personalized recommendation systems. In this paper, we propose a general hierarchical Bayesian learning framework, i.e., \emph{HBayes}, to learn both the structures and associated latent factors. Furthermore, we develop a variational inference algorithm that is able to learn model parameters with fast empirical convergence rate. The proposed HBayes is evaluated on two real-world datasets from different domains. The results demonstrate the benefits of our approach on item recommendation tasks, and show that it can outperform the state-of-the-art models in terms of precision, recall, and normalized discounted cumulative gain. To encourage the reproducible results, we make our code public on a git repo: \url{https://tinyurl.com/ycruhk4t}.


Fast-rate PAC-Bayes Generalization Bounds via Shifted Rademacher Processes

arXiv.org Machine Learning

The developments of Rademacher complexity and PAC-Bayesian theory have been largely independent. One exception is the PAC-Bayes theorem of Kakade, Sridharan, and Tewari (2008), which is established via Rademacher complexity theory by viewing Gibbs classifiers as linear operators. The goal of this paper is to extend this bridge between Rademacher complexity and state-of-the-art PAC-Bayesian theory. We first demonstrate that one can match the fast rate of Catoni's PAC-Bayes bounds (Catoni, 2007) using shifted Rademacher processes (Wegkamp, 2003; Lecu\'{e} and Mitchell, 2012; Zhivotovskiy and Hanneke, 2018). We then derive a new fast-rate PAC-Bayes bound in terms of the "flatness" of the empirical risk surface on which the posterior concentrates. Our analysis establishes a new framework for deriving fast-rate PAC-Bayes bounds and yields new insights on PAC-Bayesian theory.