stochastic environment
Combining Adversarial Guarantees and Stochastic Fast Rates in Online Learning
We consider online learning algorithms that guarantee worst-case regret rates in adversarial environments (so they can be deployed safely and will perform robustly), yet adapt optimally to favorable stochastic environments (so they will perform well in a variety of settings of practical importance). We quantify the friendliness of stochastic environments by means of the well-known Bernstein (a.k.a.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > France (0.04)
Fast Rates in Stochastic Online Convex Optimization by Exploiting the Curvature of Feasible Sets
In this work, we explore online convex optimization (OCO) and introduce a new condition and analysis that provides fast rates by exploiting the curvature of feasible sets. In online linear optimization, it is known that if the average gradient of loss functions exceeds a certain threshold, the curvature of feasible sets can be exploited by the follow-the-leader (FTL) algorithm to achieve a logarithmic regret. This study reveals that algorithms adaptive to the curvature of loss functions can also leverage the curvature of feasible sets. In particular, we first prove that if an optimal decision is on the boundary of a feasible set and the gradient of an underlying loss function is non-zero, then the algorithm achieves a regret bound of O ( ρ ln T) in stochastic environments. Here, ρ > 0 is the radius of the smallest sphere that includes the optimal decision and encloses the feasible set. Our approach, unlike existing ones, can work directly with convex loss functions, exploiting the curvature of loss functions simultaneously, and can achieve the logarithmic regret only with a local property of feasible sets. Additionally, the algorithm achieves an O ( T) regret even in adversarial environments, in which FTL suffers an Ω( T) regret, and achieves an O ( ρ ln T + Cρ ln T) regret in corrupted stochastic environments with corruption level C .
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.48)
Supplementary Materials - Adaptive Online Replanning with Diffusion Models Siyuan Zhou
In the supplementary, we first discuss the experimental details and hyperparameters in Section A. Section B, and further present the visualization in RLBench in Section C. Finally, we discuss how to MLP with 512 hidden units and Mish activations. The probability ϵ of random actions is set to 0. 03 in Stochastic Environments. So the sampled trajectories still lead to the collision. Figure 1 illustrates a problematic sampled trajectory after execution. We further evaluate the performance with different replanning steps in Table 1.
- North America > United States > Massachusetts (0.05)
- Asia > China > Hong Kong (0.05)
- North America > Canada > Alberta (0.14)
- North America > United States > Massachusetts (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- (2 more...)