Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback
Letard, Alexandre, Amghar, Tassadit, Camp, Olivier, Gutowski, Nicolas
–arXiv.org Artificial Intelligence
Recent works on Multi-Armed Bandits (MAB) and Combinatorial Multi-Armed Bandits (COM-MAB) show good results on a global accuracy metric. This can be achieved, in the case of recommender systems, with personalization. However, with a combinatorial online learning approach, personalization implies a large amount of user feedbacks. Such feedbacks can be hard to acquire when users need to be directly and frequently solicited. For a number of fields of activities undergoing the digitization of their business, online learning is unavoidable. Thus, a number of approaches allowing implicit user feedback retrieval have been implemented. Nevertheless, this implicit feedback can be misleading or inefficient for the agent's learning. Herein, we propose a novel approach reducing the number of explicit feedbacks required by Combinatorial Multi Armed bandit (COM-MAB) algorithms while providing similar levels of global accuracy and learning efficiency to classical competitive methods. In this paper we present a novel approach for considering user feedback and evaluate it using three distinct strategies. Despite a limited number of feedbacks returned by users (as low as 20% of the total), our approach obtains similar results to those of state of the art approaches.
arXiv.org Artificial Intelligence
Sep-16-2020
- Country:
- North America > United States
- New York > New York County
- New York City (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- New York > New York County
- Europe > France
- Occitanie > Haute-Garonne > Toulouse (0.04)
- North America > United States
- Genre:
- Research Report > Promising Solution (1.00)
- Overview > Innovation (0.74)
- Technology: