Tracking the Best Expert in Non-stationary Stochastic Environments

Mar-12-2024, 10:17:17 GMT–Neural Information Processing Systems

We study the dynamic regret of multi-armed bandit and experts problem in nonstationary stochastic environments. We introduce a new parameter Λ, which measures the total statistical variance of the loss distributions over T rounds of the process, and study how this amount affects the regret. We investigate the interaction between Λ and Γ, which counts the number of times the distributions change, as well as Λ and V, which measures how far the distributions deviates over time. One striking result we find is that even when Γ, V, and Λ are all restricted to constant, the regret lower bound in the bandit setting still grows with T. The other highlight is that in the full-information setting, a constant regret becomes achievable with constant Γ and Λ, as it can be made independent of T, while with constant V and Λ, the regret still has a T

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Mar-12-2024, 10:17:17 GMT

Conferences PDF

Add feedback

Country:
- Europe (0.28)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (0.68)