Robust Reinforcement Learning Objectives for Sequential Recommender Systems
Mozifian, Melissa, Sylvain, Tristan, Evans, Dave, Meng, Lili
–arXiv.org Artificial Intelligence
Attention-based sequential recommendation methods have demonstrated promising results by accurately capturing users' dynamic interests from historical interactions. In addition to generating superior user representations, recent studies have begun integrating reinforcement learning (RL) into these models. Framing sequential recommendation as an RL problem with reward signals, unlocks developing recommender systems (RS) that consider a vital aspect-incorporating direct user feedback in the form of rewards to deliver a more personalized experience. Nonetheless, employing RL algorithms presents challenges, including off-policy training, expansive combinatorial action spaces, and the scarcity of datasets with sufficient reward signals. Contemporary approaches have attempted to combine RL and sequential modeling, incorporating contrastive-based objectives and negative sampling strategies for training the RL component. In this study, we further emphasize the efficacy of contrastive-based objectives paired with augmentation to address datasets with extended horizons. Additionally, we recognize the potential instability issues that may arise during the application of negative sampling. These challenges primarily stem from the data imbalance prevalent in real-world datasets, which is a common issue in offline RL contexts. While our established baselines attempt to mitigate this through various techniques, instability remains an issue. Therefore, we introduce an enhanced methodology aimed at providing a more effective solution to these challenges.
arXiv.org Artificial Intelligence
May-30-2023
- Country:
- Asia > China
- Hong Kong (0.04)
- North America
- Canada > Quebec
- Montreal (0.04)
- Dominican Republic (0.04)
- United States
- New York > New York County
- New York City (0.04)
- Washington > King County
- Seattle (0.04)
- New York > New York County
- Canada > Quebec
- Asia > China
- Genre:
- Research Report > New Finding (0.48)
- Technology: