Bhattacharya, Anirban


Bandit Learning Through Biased Maximum Likelihood Estimation

arXiv.org Machine Learning

We propose BMLE, a new family of bandit algorithms, that are formulated in a general way based on the Biased Maximum Likelihood Estimation method originally appearing in the adaptive control literature. We design the cost-bias term to tackle the exploration and exploitation tradeoff for stochastic bandit problems. We provide an explicit closed form expression for the index of an arm for Bernoulli bandits, which is trivial to compute. We also provide a general recipe for extending the BMLE algorithm to other families of reward distributions. We prove that for Bernoulli bandits, the BMLE algorithm achieves a logarithmic finite-time regret bound and hence attains order-optimality. Through extensive simulations, we demonstrate that the proposed algorithms achieve regret performance comparable to the best of several state-of-the-art baseline methods, while having a significant computational advantage in comparison to other best performing methods. The generality of the proposed approach makes it possible to address more complex models, including general adaptive control of Markovian systems.


Heteroscedastic Bandits with Reneging

arXiv.org Machine Learning

Although shown to be useful in many areas as models for solving sequential decision problems with side observations (contexts), contextual bandits are subject to two major limitations. First, they neglect user "reneging" that occurs in real-world applications. That is, users unsatisfied with an interaction quit future interactions forever. Second, they assume that the reward distribution is homoscedastic, which is often invalidated by real-world datasets, e.g., datasets from finance. We propose a novel model of "heteroscedastic contextual bandits with reneging" to overcome the two limitations. Our model allows each user to have a distinct "acceptance level," with any interaction falling short of that level resulting in that user reneging. It also allows the variance to be a function of context. We develop a UCB-type of policy, called HR-UCB, and prove that with high probability it achieves $\mathcal{O}\Big(\sqrt{{T}}\big(\log({T})\big)^{3/2}\Big)$ regret.


$\alpha$-Variational Inference with Statistical Guarantees

arXiv.org Machine Learning

We propose a family of variational approximations to Bayesian posterior distributions, called $\alpha$-VB, with provable statistical guarantees. The standard variational approximation is a special case of $\alpha$-VB with $\alpha=1$. When $\alpha \in(0,1]$, a novel class of variational inequalities are developed for linking the Bayes risk under the variational approximation to the objective function in the variational optimization problem, implying that maximizing the evidence lower bound in variational inference has the effect of minimizing the Bayes risk within the variational density family. Operating in a frequentist setup, the variational inequalities imply that point estimates constructed from the $\alpha$-VB procedure converge at an optimal rate to the true parameter in a wide range of problems. We illustrate our general theory with a number of examples, including the mean-field variational approximation to (low)-high-dimensional Bayesian linear regression with spike and slab priors, mixture of Gaussian models, latent Dirichlet allocation, and (mixture of) Gaussian variational approximation in regular parametric models.


On Statistical Optimality of Variational Bayes

arXiv.org Machine Learning

The article addresses a long-standing open problem on the justification of using variational Bayes methods for parameter estimation. We provide general conditions for obtaining optimal risk bounds for point estimates acquired from mean-field variational Bayesian inference. The conditions pertain to the existence of certain test functions for the distance metric on the parameter space and minimal assumptions on the prior. A general recipe for verification of the conditions is outlined which is broadly applicable to existing Bayesian models with or without latent variables. As illustrations, specific applications to Latent Dirichlet Allocation and Gaussian mixture models are discussed.


Frequentist coverage and sup-norm convergence rate in Gaussian process regression

arXiv.org Machine Learning

Gaussian process (GP) regression is a powerful interpolation technique due to its flexibility in capturing non-linearity. In this paper, we provide a general framework for understanding the frequentist coverage of point-wise and simultaneous Bayesian credible sets in GP regression. As an intermediate result, we develop a Bernstein von-Mises type result under supremum norm in random design GP regression. Identifying both the mean and covariance function of the posterior distribution of the Gaussian process as regularized $M$-estimators, we show that the sampling distribution of the posterior mean function and the centered posterior distribution can be respectively approximated by two population level GPs. By developing a comparison inequality between two GPs, we provide exact characterization of frequentist coverage probabilities of Bayesian point-wise credible intervals and simultaneous credible bands of the regression function. Our results show that inference based on GP regression tends to be conservative; when the prior is under-smoothed, the resulting credible intervals and bands have minimax-optimal sizes, with their frequentist coverage converging to a non-degenerate value between their nominal level and one. As a byproduct of our theory, we show that the GP regression also yields minimax-optimal posterior contraction rate relative to the supremum norm, which provides a positive evidence to the long standing problem on optimal supremum norm contraction rate in GP regression.