Sicily
A single algorithm for both restless and rested rotting bandits
Seznec, Julien, Ménard, Pierre, Lazaric, Alessandro, Valko, Michal
In many application domains (e.g., recommender systems, intelligent tutoring systems), the rewards associated to the actions tend to decrease over time. This decay is either caused by the actions executed in the past (e.g., a user may get bored when songs of the same genre are recommended over and over) or by an external factor (e.g., content becomes outdated). These two situations can be modeled as specific instances of the rested and restless bandit settings, where arms are rotting (i.e., their value decrease over time). These problems were thought to be significantly different, since Levine et al. (2017) showed that state-of-the-art algorithms for restless bandit perform poorly in the rested rotting setting. In this paper, we introduce a novel algorithm, Rotting Adaptive Window UCB (RAW-UCB), that achieves near-optimal regret in both rotting rested and restless bandit, without any prior knowledge of the setting (rested or restless) and the type of non-stationarity (e.g., piece-wise constant, bounded variation). This is in striking contrast with previous negative results showing that no algorithm can achieve similar results as soon as rewards are allowed to increase. We confirm our theoretical findings on a number of synthetic and dataset-based experiments.
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.46)
Adaptive multi-fidelity optimization with fast learning rates
Fiegel, Come, Gabillon, Victor, Valko, Michal
In multi-fidelity optimization, biased approximations of varying costs of the target function are available. This paper studies the problem of optimizing a locally smooth function with a limited budget, where the learner has to make a tradeoff between the cost and the bias of these approximations. We first prove lower bounds for the simple regret under different assumptions on the fidelities, based on a cost-to-bias function. We then present the Kometo algorithm which achieves, with additional logarithmic factors, the same rates without any knowledge of the function smoothness and fidelity assumptions, and improves previously proven guarantees. We finally empirically show that our algorithm outperforms previous multi-fidelity optimization methods without the knowledge of problem-dependent parameters.
- North America > United States > Maryland > Baltimore (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)
- (12 more...)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (16 more...)
- North America > United States > Maryland > Baltimore (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- (18 more...)
- North America > United States > Maryland > Baltimore (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- (16 more...)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- (7 more...)
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Arizona > Maricopa County > Phoenix (0.04)
- (22 more...)
Learning Equilibria in Adversarial Team Markov Games: A Nonconvex-Hidden-Concave Min-Max Optimization Problem
The joint decisions of the agents influence both individual rewards and the transition of the environment. MARL in general is occupied with leading the multi-agent system to a favorable outcome. Through the lens of game theory, the notion of a "favorable outcome" is formally defined through concepts like a Nash
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (6 more...)
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Vietnam > Hanoi > Hanoi (0.05)
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- (7 more...)