position-based model
Position-Based Flocking for Robust Alignment
This paper presents a position-based flocking model for interacting agents, balancing cohesion-separation and alignment to achieve stable collective motion. The model modifies a position-velocity-based approach by approximating velocity differences using initial and current positions, introducing a threshold weight to ensure sustained alignment. Simulations with 50 agents in 2D demonstrate that the position-based model produces stronger alignment and more rigid and compact formations compared to the position-velocity-based model. The alignment metric and separation distances highlight the efficacy of the proposed model in achieving robust flocking behavior. The model's use of positions ensures robust alignment, with applications in robotics and collective dynamics.
Adversarial Attacks on Online Learning to Rank with Stochastic Click Models
Wang, Zichen, Balasubramanian, Rishab, Yuan, Hui, Song, Chenyu, Wang, Mengdi, Wang, Huazheng
Online learning to rank (OLTR) (Grotov and de Rijke, 2016) formulates learning to rank (Liu et al., 2009), the core problem in information retrieval, as a sequential decision-making problem. OLTR is a family of online learning solutions that exploit implicit feedback from users (e.g., clicks) to directly optimize parameterized rankers on the fly. It has drawn increasing attention in recent years (Kveton et al., 2015a; Zoghi et al., 2017; Lattimore et al., 2018; Oosterhuis and de Rijke, 2018; Wang et al., 2019; Jia et al., 2021) due to its advantages over traditional offline learning-based solutions and numerous applications in web search and recommender systems (Liu et al., 2009). To effectively utilize users' click feedback to improve the quality of ranked lists, one line of OLTR studied bandit-based algorithms under different click models. In each iteration, the algorithm presents a ranked list of K items selected from L candidates based on its estimation of the user's interests. The ranker observes the user's click feedback and updates these estimates accordingly. Different users may examine and click on the ranking list differently, and how the user interacts with the item list is called the click model. Many works have been dedicated to establishing OLTR algorithms in the cascade model (Kveton et al., 2015a,b; Zong et al., 2016; Li et al., 2016; Vial et al.,
Off-policy evaluation for learning-to-rank via interpolating the item-position model and the position-based model
Buchholz, Alexander, London, Ben, di Benedetto, Giuseppe, Joachims, Thorsten
As the underlying ranking policies constantly evolve, recommendation providers need to experiment offline with new approaches for ranking content before actually deploying and exposing them to the users [5, 6]. This serves the purpose of deploying only policies that have a large chance of improving the user experience. Deployed ranking policies provide a plethora of interaction logs that can be repurposed to learn and evaluate potentially better policies offline. These logs come in the form of implicit feedback, i.e., records of past interaction behavior, linked to information about the user, the context and the items to recommend. Off-policy evaluation of new policies on historic data requires adequate strategies to deal with biases coming from (i) the nature of user interaction and (ii) the logging policy. A prominent example of these biases is position bias [7] (content that is not ranked in the most visible positions is less likely to be seen). We focus on two popular classes of estimators that take different approaches to correcting for presentation bias. The first class, in the case of full visibility of all items, does not rely on explicit randomization, but models the randomness in user behavior. The most common model is the position-based model (PBM), which assumes that observed clicks on content factorize into relevance (depending on the item only) and the visibility of the content (depending on the position only).
TopRank: A practical algorithm for online stochastic ranking
Lattimore, Tor, Kveton, Branislav, Li, Shuai, Szepesvari, Csaba
Online learning to rank is a sequential decision-making problem where in each round the learning agent chooses a list of items and receives feedback in the form of clicks from the user. Many sample-efficient algorithms have been proposed for this problem that assume a specific click model connecting rankings and user behavior. We propose a generalized click model that encompasses many existing models, including the position-based and cascade models. Our generalization motivates a novel online learning algorithm based on topological sort, which we call TopRank. TopRank is (a) more natural than existing algorithms, (b) has stronger regret guarantees than existing algorithms with comparable generality, (c) has a more insightful proof that leaves the door open to many generalizations, and (d) outperforms existing algorithms empirically.
TopRank: A practical algorithm for online stochastic ranking
Lattimore, Tor, Kveton, Branislav, Li, Shuai, Szepesvari, Csaba
Online learning to rank is a sequential decision-making problem where in each round the learning agent chooses a list of items and receives feedback in the form of clicks from the user. Many sample-efficient algorithms have been proposed for this problem that assume a specific click model connecting rankings and user behavior. We propose a generalized click model that encompasses many existing models, including the position-based and cascade models. Our generalization motivates a novel online learning algorithm based on topological sort, which we call TopRank. TopRank is (a) more natural than existing algorithms, (b) has stronger regret guarantees than existing algorithms with comparable generality, (c) has a more insightful proof that leaves the door open to many generalizations, and (d) outperforms existing algorithms empirically.
Consistent Position Bias Estimation without Online Interventions for Learning-to-Rank
Agarwal, Aman, Zaitsev, Ivan, Joachims, Thorsten
Presentation bias is one of the key challenges when learning from implicit feedback in search engines, as it confounds the relevance signal with uninformative signals due to position in the ranking, saliency, and other presentation factors. While it was recently shown how counterfactual learning-to-rank (LTR) approaches \cite{Joachims/etal/17a} can provably overcome presentation bias if observation propensities are known, it remains to show how to accurately estimate these propensities. In this paper, we propose the first method for producing consistent propensity estimates without manual relevance judgments, disruptive interventions, or restrictive relevance modeling assumptions. We merely require that we have implicit feedback data from multiple different ranking functions. Furthermore, we argue that our estimation technique applies to an extended class of Contextual Position-Based Propensity Models, where propensities not only depend on position but also on observable features of the query and document. Initial simulation studies confirm that the approach is scalable, accurate, and robust.
TopRank: A practical algorithm for online stochastic ranking
Lattimore, Tor, Kveton, Branislav, Li, Shuai, Szepesvari, Csaba
Online learning to rank is a sequential decision-making problem where in each round the learning agent chooses a list of items and receives feedback in the form of clicks from the user. Many sample-efficient algorithms have been proposed for this problem that assume a specific click model connecting rankings and user behavior. We propose a generalized click model that encompasses many existing models, including the position-based and cascade models. Our generalization motivates a novel online learning algorithm based on topological sort, which we call TopRank. TopRank is (a) more natural than existing algorithms, (b) has stronger regret guarantees than existing algorithms with comparable generality, (c) has a more insightful proof that leaves the door open to many generalizations, (d) outperforms existing algorithms empirically.