Goto

Collaborating Authors

 Bayesian Learning


Nonparametric Hawkes Processes: Online Estimation and Generalization Bounds

arXiv.org Machine Learning

In this paper, we design a nonparametric online algorithm for estimating the triggering functions of multivariate Hawkes processes. Unlike parametric estimation, where evolutionary dynamics can be exploited for fast computation of the gradient, and unlike typical function learning, where representer theorem is readily applicable upon proper regularization of the objective function, nonparametric estimation faces the challenges of (i) inefficient evaluation of the gradient, (ii) lack of representer theorem, and (iii) computationally expensive projection necessary to guarantee positivity of the triggering functions. In this paper, we offer solutions to the above challenges, and design an online estimation algorithm named NPOLE-MHP that outputs estimations with a $\mathcal{O}(1/T)$ regret, and a $\mathcal{O}(1/T)$ stability. Furthermore, we design an algorithm, NPOLE-MMHP, for estimation of multivariate marked Hawkes processes. We test the performance of NPOLE-MHP on various synthetic and real datasets, and demonstrate, under different evaluation metrics, that NPOLE-MHP performs as good as the optimal maximum likelihood estimation (MLE), while having a run time as little as parametric online algorithms.


Understanding Naรฏve Bayes Classifier Using R โ€“ R-posts.com

#artificialintelligence

Chaitanya Sagar is the Founder and CEO of Perceptive Analytics. Perceptive Analytics has been chosen as one of the top 10 analytics companies to watch out for by Analytics India Magazine.


Experimentally detecting a quantum change point via Bayesian inference

arXiv.org Machine Learning

Detecting a change point is a crucial task in statistics that has been recently extended to the quantum realm. A source state generator that emits a series of single photons in a default state suffers an alteration at some point and starts to emit photons in a mutated state. The problem consists in identifying the point where the change took place. In this work, we consider a learning agent that applies Bayesian inference on experimental data to solve this problem. This learning machine adjusts the measurement over each photon according to the past experimental results finds the change position in an online fashion. Our results show that the local-detection success probability can be largely improved by using such a machine learning technique. This protocol provides a tool for improvement in many applications where a sequence of identical quantum states is required.


Estimating Heterogeneous Consumer Preferences for Restaurants and Travel Time Using Mobile Location Data

arXiv.org Machine Learning

This paper analyzes consumer choices over lunchtime restaurants using data from a sample of several thousand anonymous mobile phone users in the San Francisco Bay Area. The data is used to identify users' approximate typical morning location, as well as their choices of lunchtime restaurants. We build a model where restaurants have latent characteristics (whose distribution may depend on restaurant observables, such as star ratings, food category, and price range), each user has preferences for these latent characteristics, and these preferences are heterogeneous across users. Similarly, each item has latent characteristics that describe users' willingness to travel to the restaurant, and each user has individual-specific preferences for those latent characteristics. Thus, both users' willingness to travel and their base utility for each restaurant vary across user-restaurant pairs. We use a Bayesian approach to estimation. To make the estimation computationally feasible, we rely on variational inference to approximate the posterior distribution, as well as stochastic gradient descent as a computational approach. Our model performs better than more standard competing models such as multinomial logit and nested logit models, in part due to the personalization of the estimates. We analyze how consumers re-allocate their demand after a restaurant closes to nearby restaurants versus more distant restaurants with similar characteristics, and we compare our predictions to actual outcomes. Finally, we show how the model can be used to analyze counterfactual questions such as what type of restaurant would attract the most consumers in a given location.


Optimizing Prediction Intervals by Tuning Random Forest via Meta-Validation

arXiv.org Machine Learning

Recent studies have shown that tuning prediction models increases prediction accuracy and that Random Forest can be used to construct prediction intervals. However, to our best knowledge, no study has investigated the need to, and the manner in which one can, tune Random Forest for optimizing prediction intervals { this paper aims to fill this gap. We explore a tuning approach that combines an effectively exhaustive search with a validation technique on a single Random Forest parameter. This paper investigates which, out of eight validation techniques, are beneficial for tuning, i.e., which automatically choose a Random Forest configuration constructing prediction intervals that are reliable and with a smaller width than the default configuration. Additionally, we present and validate three meta-validation techniques to determine which are beneficial, i.e., those which automatically chose a beneficial validation technique. This study uses data from our industrial partner (Keymind Inc.) and the Tukutuku Research Project, related to post-release defect prediction and Web application effort estimation, respectively. Results from our study indicate that: i) the default configuration is frequently unreliable, ii) most of the validation techniques, including previously successfully adopted ones such as 50/50 holdout and bootstrap, are counterproductive in most of the cases, and iii) the 75/25 holdout meta-validation technique is always beneficial; i.e., it avoids the likely counterproductive effects of validation techniques.


Bayesian Deep Convolutional Encoder-Decoder Networks for Surrogate Modeling and Uncertainty Quantification

arXiv.org Machine Learning

We are interested in the development of surrogate models for uncertainty quantification and propagation in problems governed by stochastic PDEs using a deep convolutional encoder-decoder network in a similar fashion to approaches considered in deep learning for image-to-image regression tasks. Since normal neural networks are data intensive and cannot provide predictive uncertainty, we propose a Bayesian approach to convolutional neural nets. A recently introduced variational gradient descent algorithm based on Stein's method is scaled to deep convolutional networks to perform approximate Bayesian inference on millions of uncertain network parameters. This approach achieves state of the art performance in terms of predictive accuracy and uncertainty quantification in comparison to other approaches in Bayesian neural networks as well as techniques that include Gaussian processes and ensemble methods even when the training data size is relatively small. To evaluate the performance of this approach, we consider standard uncertainty quantification benchmark problems including flow in heterogeneous media defined in terms of limited data-driven permeability realizations. The performance of the surrogate model developed is very good even though there is no underlying structure shared between the input (permeability) and output (flow/pressure) fields as is often the case in the image-to-image regression models used in computer vision problems. Studies are performed with an underlying stochastic input dimensionality up to $4,225$ where most other uncertainty quantification methods fail. Uncertainty propagation tasks are considered and the predictive output Bayesian statistics are compared to those obtained with Monte Carlo estimates.


Bayesian Nonparametric Causal Inference: Information Rates and Learning Algorithms

arXiv.org Machine Learning

We investigate the problem of estimating the causal effect of a treatment on individual subjects from observational data, this is a central problem in various application domains, including healthcare, social sciences, and online advertising. Within the Neyman Rubin potential outcomes model, we use the Kullback Leibler (KL) divergence between the estimated and true distributions as a measure of accuracy of the estimate, and we define the information rate of the Bayesian causal inference procedure as the (asymptotic equivalence class of the) expected value of the KL divergence between the estimated and true distributions as a function of the number of samples. Using Fano method, we establish a fundamental limit on the information rate that can be achieved by any Bayesian estimator, and show that this fundamental limit is independent of the selection bias in the observational data. We characterize the Bayesian priors on the potential (factual and counterfactual) outcomes that achieve the optimal information rate. As a consequence, we show that a particular class of priors that have been widely used in the causal inference literature cannot achieve the optimal information rate. On the other hand, a broader class of priors can achieve the optimal information rate. We go on to propose a prior adaptation procedure (which we call the information based empirical Bayes procedure) that optimizes the Bayesian prior by maximizing an information theoretic criterion on the recovered causal effects rather than maximizing the marginal likelihood of the observed (factual) data. Building on our analysis, we construct an information optimal Bayesian causal inference algorithm.


Fair Inference On Outcomes

arXiv.org Machine Learning

In this paper, we consider the problem of fair statistical inference involving outcome variables. Examples include classification and regression problems, and estimating treatment effects in randomized trials or observational data. The issue of fairness arises in such problems where some covariates or treatments are "sensitive," in the sense of having potential of creating discrimination. In this paper, we argue that the presence of discrimination can be formalized in a sensible way as the presence of an effect of a sensitive covariate on the outcome along certain causal pathways, a view which generalizes (Pearl, 2009). A fair outcome model can then be learned by solving a constrained optimization problem. We discuss a number of complications that arise in classical statistical inference due to this view and provide workarounds based on recent work in causal and semi-parametric inference.


Bayesian Inference of Spreading Processes on Networks

arXiv.org Machine Learning

Human susceptibility to epidemics of misinformation and disease has grown manyfold as the world we inhabit keeps getting smaller due to increased access to online information and soaring global mobility. Social media platforms have changed the way we consume information [Schmidt et al., 2017], and more and more people find their news through social media [Newman et al., 2015]. Following the 2016 presidential election in the United States, there have been investigations into the spread of false stories, or "fake news" on social media, and based on web browsing data, archives of fact-checking websites, and results from an online survey, a recent study found that social media were an important source of election news [Allcott and Gentzkow, 2017]. While ascertainment of social network structures is generally difficult using traditional survey-based approaches, such as name generators, which are survey questions designed to solicit information about friends and acquaintances of a subject, online platforms readily capture the structure of large-scale social networks, therefore making them well suited to study spread of information whether accurate or not. Further, although the transmission mechanisms are very different, the spread of information in online systems has many similarities to the spread of infectious diseases among hosts in a population. From a mathematical and statistical point of view, one can therefore investigate the spread of pathogens and the spread of information in the same framework as long as the network structure accurately captures the transmission pathways and the spreading process is parametrized appropriately. In this paper, we consider a simple susceptible-infected (SI) process and a more complex spreading process on a fixed and known network structure. This spreading process may be conceptualized as propagating either a pathogen or a piece of information. We focus on addressing two distinct questions that are relevant in both settings: (1) How to infer the unknown parameters associated with the spreading process?


Learning uncertainty in regression tasks by artificial neural networks

arXiv.org Machine Learning

We suggest a general approach to quantification of different forms of uncertainty in regression tasks performed by artificial neural networks. It is based on the simultaneous training of two neural networks with a joint loss function. One of the networks performs predictions and the other simultaneously quantifies the uncertainty of predictions by estimating the locally averaged loss of the first one. Unlike in many classical uncertainty quantification methods, the targets are not assumed to be sampled from a probability distribution of an a priori given form. We analyze how the hyperparameters affect the learning process and, additionally, show that our method even allows for better predictions compared to standard neural networks without uncertainty counterparts. Finally, we show that particular cases of our approach include maximization of log-likelihood, assuming Gaussian or Laplace noise.