Bayesian Learning
A Simultaneous Transformation and Rounding Approach for Modeling Integer-Valued Data
Kowal, Daniel R., Canale, Antonio
Integer-valued and count data are ubiquitous in many fields, including epidemiology (Osthus et al., 2018; Kowal, 2019), ecology (Dorazio et al., 2005), and insurance (Bening and Korolev, 2012), among others (Cameron and Trivedi, 2013). Count data also serve as an indicator of demand, such as the demand for medical services (Deb and Trivedi, 1997), emergency medical services (Matteson et al., 2011), and call center access (Shen and Huang, 2008). In these applications and many others, integer-valued data are frequently observed jointly with predictors, over time intervals, or across spatial locations. Integer-valued data also exhibit a variety of distributional features, including zero-inflation, skewness, over-or underdispersion, and in some cases may be bounded or censored. Flexible and interpretable models for integervalued processes are therefore highly useful in practice. The most widely-used models for count data build upon the Poisson distribution. However, the limitations of the Poisson distribution are well-known: the distribution is not sufficiently flexible in practice and cannot account for zero-inflation or over-and underdispersion. A common strategy is to generalize the Poisson model by introducing additional parameters.
Modeling Food Popularity Dependencies using Social Media data
Khulbe, Devashish, Pathak, Manu
The rise in popularity of major social media platforms have enabled people to share photos and textual information about their daily life. One of the popular topics about which information is shared is food. Since a lot of media about food are attributed to particular locations and restaurants, information like popularity of spatio-temporal popularity of various cuisines can be analysed. Tracking the popularity of food types and retail locations across space and time can also be useful for business owners and restaurant investors. In this work, we present an approach using off-the shelf machine learning techniques to identify trends and popularity of cuisine types in an area using geo-tagged data from social media, Google images and Yelp. After adjusting for time, we use the Kernel Density Estimation to get hot spots across the location and model the dependencies among food cuisines popularity using Bayesian Networks. We consider the Manhattan borough of New York City as the location for our analyses but the approach can be used for any area with social media data and information about retail businesses.
Selection Via Proxy: Efficient Data Selection For Deep Learning
Coleman, Cody, Yeh, Christopher, Mussmann, Stephen, Mirzasoleiman, Baharan, Bailis, Peter, Liang, Percy, Leskovec, Jure, Zaharia, Matei
Data selection methods such as active learning and core-set selection are useful tools for machine learning on large datasets, but they can be prohibitively expensive to apply in deep learning. Unlike in other areas of machine learning, the feature representations that these techniques depend on are learned in deep learning rather than given, which takes a substantial amount of training time. In this work, we show that we can significantly improve the computational efficiency of data selection in deep learning by using a much smaller proxy model to perform data selection for tasks that will eventually require a large target model (e.g., selecting data points to label for active learning). In deep learning, we can scale down models by removing hidden layers or reducing their dimension to create proxies that are an order of magnitude faster. Although these small proxy models have significantly higher error, we find that they empirically provide useful rankings for data selection that have a high correlation with those of larger models. We evaluate this "selection via proxy" (SVP) approach on several data selection tasks. For active learning, applying SVP to Sener and Savarese [2018]'s recent method for active learning in deep learning gives a 4x improvement in execution time while yielding the same model accuracy. For core-set selection, we show that a proxy model that trains 10x faster than a target ResNet164 model on CIFAR10 can be used to remove 50% of the training data without compromising the accuracy of the target model, making end-to-end training time improvements via core-set selection possible.
From self-tuning regulators to reinforcement learning and back again
Matni, Nikolai, Proutiere, Alexandre, Rantzer, Anders, Tu, Stephen
Machine and reinforcement learning (RL) are being applied to plan and control the behavior of autonomous systems interacting with the physical world -- examples include self-driving vehicles, distributed sensor networks, and agile robots. However, if machine learning is to be applied in these new settings, the resulting algorithms must come with the reliability, robustness, and safety guarantees that are hallmarks of the control theory literature, as failures could be catastrophic. Thus, as RL algorithms are increasingly and more aggressively deployed in safety critical settings, it is imperative that control theorists be part of the conversation. The goal of this tutorial paper is to provide a jumping off point for control theorists wishing to work on RL related problems by covering recent advances in bridging learning and control theory, and by placing these results within the appropriate historical context of the system identification and adaptive control literatures.
Monte Carlo Gradient Estimation in Machine Learning
Mohamed, Shakir, Rosca, Mihaela, Figurnov, Michael, Mnih, Andriy
This paper is a broad and accessible survey of the methods we have at our disposal for Monte Carlo gradient estimation in machine learning and across the statistical sciences: the problem of computing the gradient of an expectation of a function with respect to parameters defining the distribution that is integrated; the problem of sensitivity analysis. In machine learning research, this gradient problem lies at the core of many learning problems, in supervised, unsupervised and reinforcement learning. We will generally seek to rewrite such gradients in a form that allows for Monte Carlo estimation, allowing them to be easily and efficiently used and analysed. We explore three strategies--the pathwise, score function, and measure-valued gradient estimators-- exploring their historical developments, derivation, and underlying assumptions. We describe their use in other fields, show how they are related and can be combined, and expand on their possible generalisations. Wherever Monte Carlo gradient estimators have been derived and deployed in the past, important advances have followed. A deeper and more widely-held understanding of this problem will lead to further advances, and it is these advances that we wish to support.
A Self-supervised Approach to Hierarchical Forecasting with Applications to Groupwise Synthetic Controls
Mishchenko, Konstantin, Montgomery, Mallory, Vaggi, Federico
When forecasting time series with a hierarchical structure, the existing state of the art is to forecast each time series independently, and, in a post-treatment step, to reconcile the time series in a way that respects the hierarchy (Hyndman et al., 2011; Wickramasuriya et al., 2018). We propose a new loss function that can be incorporated into any maximum likelihood objective with hierarchical data, resulting in reconciled estimates with confidence intervals that correctly account for additional uncertainty due to imperfect reconciliation. We evaluate our method using a non-linear model and synthetic data on a counterfactual forecasting problem, where we have access to the ground truth and contemporaneous covariates, and show that we largely improve over the existing state-of-the-art method.
An Unsupervised Bayesian Neural Network for Truth Discovery
The problem of estimating event truths from conflicting agent opinions is investigated. An autoencoder learns the complex relationships between event truths, agent reliabilities and agent observations. A Bayesian network model is proposed to guide the learning of the autoencoder by modeling the dependence of agent reliabilities corresponding to different data samples. At the same time, it also models the social relationships between agents in the network. The proposed approach is unsupervised and is applicable when ground truth labels of events are unavailable. A variational inference method is used to jointly estimate the hidden variables in the Bayesian network and the parameters in the autoencoder. Simulations and experiments on real data suggest that the proposed method performs better than several other inference methods, including majority voting, the Bayesian Classifier Combination (BCC) method, the Community BCC method, and the recently proposed VISIT method.
A Review of Statistical Learning Machines from ATR to DNA Microarrays: design, assessment, and advice for practitioners
Statistical Learning is the process of estimating an unknown probabilistic input-output relationship of a system using a limited number of observations; and a statistical learning machine (SLM) is the machine that learned such a process. While their roots grow deeply in Probability Theory, SLMs are ubiquitous in the modern world. Automatic Target Recognition (ATR) in military applications, Computer Aided Diagnosis (CAD) in medical imaging, DNA microarrays in Genomics, Optical Character Recognition (OCR), Speech Recognition (SR), spam email filtering, stock market prediction, etc., are few examples and applications for SLM; diverse fields but one theory. The field of Statistical Learning can be decomposed to two basic subfields, Design and Assessment. Three main groups of specializations-namely statisticians, engineers, and computer scientists (ordered ascendingly by programming capabilities and descendingly by mathematical rigor)-exist on the venue of this field and each takes its elephant bite. Exaggerated rigorous analysis of statisticians sometimes deprives them from considering new ML techniques and methods that, yet, have no "complete" mathematical theory. On the other hand, immoderate add-hoc simulations of computer scientists sometimes derive them towards unjustified and immature results. A prudent approach is needed that has the enough flexibility to utilize simulations and trials and errors without sacrificing any rigor. If this prudent attitude is necessary for this field it is necessary, as well, in other fields of Engineering.
Program Synthesis and Semantic Parsing with Learned Code Idioms
Shin, Richard, Allamanis, Miltiadis, Brockschmidt, Marc, Polozov, Oleksandr
Program synthesis of general-purpose source code from natural language specifications is challenging due to the need to reason about high-level patterns in the target program and low-level implementation details at the same time. In this work, we present PATOIS, a system that allows a neural program synthesizer to explicitly interleave high-level and low-level reasoning at every generation step. It accomplishes this by automatically mining common code idioms from a given corpus, incorporating them into the underlying language for neural synthesis, and training a tree-based neural synthesizer to use these idioms during code generation. We evaluate PATOIS on two complex semantic parsing datasets and show that using learned code idioms improves the synthesizer's accuracy.
A Review on Neural Network Models of Schizophrenia and Autism Spectrum Disorder
Lanillos, Pablo, Oliva, Daniel, Philippsen, Anja, Yamashita, Yuichi, Nagai, Yukie, Cheng, Gordon
This survey presents the most relevant neural network models of autism spectrum disorder and schizophrenia, from the first connectionist models to recent deep network architectures. We analyzed and compared the most representative symptoms with its neural model counterpart, detailing the alteration introduced in the network that generates each of the symptoms, and identifying their strengths and weaknesses. For completeness we additionally cross-compared Bayesian and free-energy approaches. Models of schizophrenia mainly focused on hallucinations and delusional thoughts using neural disconnections or inhibitory imbalance as the predominating alteration. Models of autism rather focused on perceptual difficulties, mainly excessive attention to environment details, implemented as excessive inhibitory connections or increased sensory precision. We found an excessive tight view of the psychopathologies around one specific and simplified effect, usually constrained to the technical idiosyncrasy of the network used. Recent theories and evidence on sensorimotor integration and body perception combined with modern neural network architectures offer a broader and novel spectrum to approach these psychopathologies, outlining the future research on neural networks computational psychiatry, a powerful asset for understanding the inner processes of the human brain.