Non-Stationary Bandits under Recharging Payoffs: Improved Planning with Sublinear Regret

Aug-22-2025, 00:49:01 GMT–Neural Information Processing Systems

The stochastic multi-armed bandit setting has been recently studied in the non-stationary regime, where the mean payoff of each action is a non-decreasing function of the number of rounds passed since it was last played. This model captures natural behavioral aspects of the users which crucially determine the performance of recommendation platforms, ad placement systems, and more.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Aug-22-2025, 00:49:01 GMT

Conferences PDF

Add feedback

Country:
- North America > United States
  - Texas > Travis County
    - Austin (0.04)
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
  - California > San Francisco County
    - San Francisco (0.14)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - France > Île-de-France
    - Paris > Paris (0.04)

Genre:
- Research Report (0.46)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.68)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
7fccdff3f1457cb7b846596c76c23abd-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found