Bayesian Inference
A Jointly Efficient and Optimal Algorithm for Heteroskedastic Generalized Linear Bandits with Adversarial Corruptions
Kim, Sanghwa, Lee, Junghyun, Yun, Se-Young
We consider the problem of heteroskedastic generalized linear bandits (GLBs) with adversarial corruptions, which subsumes various stochastic contextual bandit settings, including heteroskedastic linear bandits and logistic/Poisson bandits. We propose HCW-GLB-OMD, which consists of two components: an online mirror descent (OMD)-based estimator and Hessian-based confidence weights to achieve corruption robustness. This is computationally efficient in that it only requires ${O}(1)$ space and time complexity per iteration. Under the self-concordance assumption on the link function, we show a regret bound of $\tilde{O}\left( d \sqrt{\sum_t g(ฯ_t) \dotฮผ_{t,\star}} + d^2 g_{\max} ฮบ+ d ฮบC \right)$, where $\dotฮผ_{t,\star}$ is the slope of $ฮผ$ around the optimal arm at time $t$, $g(ฯ_t)$'s are potentially exogenously time-varying dispersions (e.g., $g(ฯ_t) = ฯ_t^2$ for heteroskedastic linear bandits, $g(ฯ_t) = 1$ for Bernoulli and Poisson), $g_{\max} = \max_{t \in [T]} g(ฯ_t)$ is the maximum dispersion, and $C \geq 0$ is the total corruption budget of the adversary. We complement this with a lower bound of $\tildeฮฉ(d \sqrt{\sum_t g(ฯ_t) \dotฮผ_{t,\star}} + d C)$, unifying previous problem-specific lower bounds. Thus, our algorithm achieves, up to a $ฮบ$-factor in the corruption term, instance-wise minimax optimality simultaneously across various instances of heteroskedastic GLBs with adversarial corruptions.
Bayesian Inference of Contextual Bandit Policies via Empirical Likelihood
Ouyang, Jiangrong, Gong, Mingming, Bondell, Howard
Policy inference plays an essential role in the contextual bandit problem. In this paper, we use empirical likelihood to develop a Bayesian inference method for the joint analysis of multiple contextual bandit policies in finite sample regimes. The proposed inference method is robust to small sample sizes and is able to provide accurate uncertainty measurements for policy value evaluation. In addition, it allows for flexible inferences on policy comparison with full uncertainty quantification. We demonstrate the effectiveness of the proposed inference method using Monte Carlo simulations and its application to an adolescent body mass index data set.
PAC-Bayes under potentially heavy tails
WederivePAC-Bayesian learning guarantees forheavy-tailed losses, andobtain a novel optimal Gibbs posterior which enjoys finite-sample excess risk bounds atlogarithmic confidence. Ourcoretechnique itselfmakesuseofPAC-Bayesian inequalities in order to derive a robust risk estimator, which by design is easy to compute.
An Algorithm for Learning Switched Linear Dynamics from Data Guillaume Berger Monal Narasimhamurthy
We present an algorithm for learning switched linear dynamical systems in discrete time from noisy observations of the system's full state or output. Switched linear systems use multiple linear dynamical modes to fit the data within some desired tolerance. They arise quite naturally in applications to robotics and cyberphysical systems. Learning switched systems from data is a NP-hard problem that is nearly identical to the k-linear regression problem of fitting k > 1 linear models to the data. A direct mixed-integer linear programming (MILP) approach yields time complexity that is exponential in the number of data points. In this paper, we modify the problem formulation to yield an algorithm that is linear in the size of the data while remaining exponential in the number of state variables and the desired number of modes. To do so, we combine classic ideas from the ellipsoidal method for solving convex optimization problems, and well-known oracle separation results in non-smooth optimization. We demonstrate our approach on a set of microbenchmarks and a few interesting real-world problems. Our evaluation suggests that the benefits of this algorithm can be made practical even against highly optimized off-the-shelf MILP solvers.