Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective
Maystre, Lucas, Russo, Daniel, Zhao, Yu
–arXiv.org Artificial Intelligence
We study the problem of optimizing a recommender system for outcomes that occur over several weeks or months. We begin by drawing on reinforcement learning to formulate a comprehensive model of users' recurring relationships with a recommender system. Measurement, attribution, and coordination challenges complicate algorithm design. We describe careful modeling -- including a new representation of user state and key conditional independence assumptions -- which overcomes these challenges and leads to simple, testable recommender system prototypes. We apply our approach to a podcast recommender system that makes personalized recommendations to hundreds of millions of listeners. A/B tests demonstrate that purposefully optimizing for long-term outcomes leads to large performance gains over conventional approaches that optimize for short-term proxies.
arXiv.org Artificial Intelligence
Feb-28-2023
- Country:
- North America
- United States
- District of Columbia > Washington (0.04)
- New York > New York County
- New York City (0.04)
- California > Los Angeles County
- Los Angeles (0.14)
- Puerto Rico > San Juan
- San Juan (0.04)
- United States
- Asia
- Middle East > Jordan (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- North America
- Genre:
- Research Report
- Experimental Study (0.67)
- New Finding (0.45)
- Research Report
- Industry:
- Leisure & Entertainment (1.00)
- Media > Music (0.46)