stochastic regime
On Optimal Robustness to Adversarial Corruption in Online Decision Problems
This paper considers two fundamental sequential decision-making problems: the problem of prediction with expert advice and the multi-armed bandit problem. We focus on stochastic regimes in which an adversary may corrupt losses, and we investigate what level of robustness can be achieved against adversarial corruption. The main contribution of this paper is to show that optimal robustness can be expressed by a square-root dependency on the amount of corruption.
On Optimal Robustness to Adversarial Corruption in Online Decision Problems
This paper considers two fundamental sequential decision-making problems: the problem of prediction with expert advice and the multi-armed bandit problem. We focus on stochastic regimes in which an adversary may corrupt losses, and we investigate what level of robustness can be achieved against adversarial corruption. The main contribution of this paper is to show that optimal robustness can be expressed by a square-root dependency on the amount of corruption.
Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits
This study aims to develop bandit algorithms that automatically exploit tendencies of certain environments to improve performance, without any prior knowledge regarding the environments. We first propose an algorithm for combinatorial semi-bandits with a hybrid regret bound that includes two main features: a bestof-three-worlds guarantee and multiple data-dependent regret bounds. The former means that the algorithm will work nearly optimally in all environments in an adversarial setting, a stochastic setting, or a stochastic setting with adversarial corruptions. The latter implies that, even if the environment is far from exhibiting stochastic behavior, the algorithm will perform better as long as the environment is "easy" in terms of certain metrics. The metrics w.r.t. the easiness referred to in this paper include cumulative loss for optimal actions, total quadratic variation of losses, and path-length of a loss sequence. We also show hybrid data-dependent regret bounds for adversarial linear bandits, which include a first path-length regret bound that is tight up to logarithmic factors.
Taming Heavy-Tailed Losses in Adversarial Bandits and the Best-of-Both-Worlds Setting
Consider the multi-armed bandits (MAB) problem (Auer et al., 2002a,b), which is a useful framework Typically, the losses are assumed to have a support on a bounded interval (e.g., Moreover, while the former ones enjoy a logarithmic regret (i.e., These performance discrepancies motivated the study of the Best-of-Both-W orlds (BOBW) setting.
Stepping on the Edge: Curvature A ware Learning Rate Tuners
(Liu and Nocedal, 1989). Similar efforts have been made for Polyak stepsizes (Berrada et al., 2020; Loizou et al., 2021), in addition to new methods which combine distance to optimality with online learning convergence bounds (Cutkosky et al., 2023; Classically-inspired methods, however, have generally struggled to gain traction in deep learning.