Adversarial Attacks on Online Learning to Rank with Stochastic Click Models

Wang, Zichen, Balasubramanian, Rishab, Yuan, Hui, Song, Chenyu, Wang, Mengdi, Wang, Huazheng

arXiv.org Artificial Intelligence 

Online learning to rank (OLTR) (Grotov and de Rijke, 2016) formulates learning to rank (Liu et al., 2009), the core problem in information retrieval, as a sequential decision-making problem. OLTR is a family of online learning solutions that exploit implicit feedback from users (e.g., clicks) to directly optimize parameterized rankers on the fly. It has drawn increasing attention in recent years (Kveton et al., 2015a; Zoghi et al., 2017; Lattimore et al., 2018; Oosterhuis and de Rijke, 2018; Wang et al., 2019; Jia et al., 2021) due to its advantages over traditional offline learning-based solutions and numerous applications in web search and recommender systems (Liu et al., 2009). To effectively utilize users' click feedback to improve the quality of ranked lists, one line of OLTR studied bandit-based algorithms under different click models. In each iteration, the algorithm presents a ranked list of K items selected from L candidates based on its estimation of the user's interests. The ranker observes the user's click feedback and updates these estimates accordingly. Different users may examine and click on the ranking list differently, and how the user interacts with the item list is called the click model. Many works have been dedicated to establishing OLTR algorithms in the cascade model (Kveton et al., 2015a,b; Zong et al., 2016; Li et al., 2016; Vial et al.,

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found