Cascading Bandits: Optimizing Recommendation Frequency in Delayed Feedback Environments
–Neural Information Processing Systems
Delayed feedback is a critical problem in dynamic recommender systems. In practice, the feedback result often depends on the frequency of recommendation. Most existing online learning literature fails to consider optimization of the recommendation frequency, and regards the reward from each successfully recommended message as equal. In this paper, we consider a novel cascading bandits setting, where individual messages from a selected list are sent to a user periodically. Whenever a user does not like a message, she may abandon the system with a probability positively correlated with the recommendation frequency.
Neural Information Processing Systems
Feb-11-2025, 17:04:11 GMT