lsr
Credit-assigned Policy Gradient for Early Stage Retrieval in Two-stage Ranking
Kiyohara, Haruka, Curmei, Mihaela, Evnine, Ariel, Kalyanaraman, Shankar, Nir, Israel, Pop, Ana-Roxana, Razin, Nitzan, Dean, Sarah, Joachims, Thorsten, Weinsberg, Udi
Large-scale search, recommendation, and retrieval-augmented generation (RAG) systems typically employ a two-stage architecture: an early-stage ranker (ESR) generates a candidate set, which is subsequently re-ranked by a late-stage ranker (LSR). While there are many reinforcement learning (RL) methods for training the LSR, end-to-end training of the ESR has proven challenging. In particular, naive application of "vanilla" policy gradient (V-PG) is not scalable for candidate-set sizes relevant for practical use due to exploding variance. This issue arises because V-PG propagates the gradient to the joint probability of the candidate sets, ignoring the contribution of each specific item in the candidate set to the reward. To mitigate this issue, we propose a novel "credit-assigned" policy gradient (CA-PG), which computes gradients with respect to the probability that the target item is chosen in any candidate set, i.e. marginalizing over all candidate sets that contain it. Our theoretical analysis reveals that CA-PG significantly reduces the variance of V-PG by marginalizing over the specific composition of the candidate set, while preserving the ability to learn the correct ranking of items under a reasonably aligned LSR policy. Experiments on both synthetic and real-world data demonstrate that CA-PG improves the convergence speed and training stability for ESRs utilizing the canonical Plackett-Luce model, especially when the candidate-set size is large.
SMART-3D: Three-Dimensional Self-Morphing Adaptive Replanning Tree
Agrawal, Priyanshu, Gupta, Shalabh, Shen, Zongyuan
Abstract--This paper presents SMART -3D, an extension of the SMART algorithm to 3D environments. SMART -3D is a tree-based adaptive replanning algorithm for dynamic environments with fast moving obstacles. SMART -3D morphs the underlying tree to find a new path in real-time whenever the current path is blocked by obstacles. SMART -3D removed the grid decomposition requirement of the SMART algorithm by replacing the concept of hot-spots with that of hot-nodes, thus making it computationally efficient and scalable to 3D environments. The hot-nodes are nodes which allow for efficient reconnections to morph the existing tree to find a new safe and reliable path. The performance of SMART -3D is evaluated by extensive simulations in 2D and 3D environments populated with randomly moving dynamic obstacles. The results show that SMART -3D achieves high success rates and low replanning times, thus highlighting its suitability for real-time onboard applications. Recent decades have seen significant growth of autonomous robots in supporting a diverse range of human operations.
3DGS_LSR:Large_Scale Relocation for Autonomous Driving Based on 3D Gaussian Splatting
Lu, Haitao, Chen, Haijier, Liu, Haoze, Zhang, Shoujian, Xu, Bo, Liu, Ziao
In autonomous robotic systems, precise localization is a prerequisite for safe navigation. However, in complex urban environments, GNSS positioning often suffers from signal occlusion and multipath effects, leading to unreliable absolute positioning. Traditional mapping approaches are constrained by storage requirements and computational inefficiency, limiting their applicability to resource-constrained robotic platforms. To address these challenges, we propose 3DGS-LSR: a large-scale relocalization framework leveraging 3D Gaussian Splatting (3DGS), enabling centimeter-level positioning using only a single monocular RGB image on the client side. We combine multi-sensor data to construct high-accuracy 3DGS maps in large outdoor scenes, while the robot-side localization requires just a standard camera input. Using SuperPoint and SuperGlue for feature extraction and matching, our core innovation is an iterative optimization strategy that refines localization results through step-by-step rendering, making it suitable for real-time autonomous navigation. Experimental validation on the KITTI dataset demonstrates our 3DGS-LSR achieves average positioning accuracies of 0.026m, 0.029m, and 0.081m in town roads, boulevard roads, and traffic-dense highways respectively, significantly outperforming other representative methods while requiring only monocular RGB input. This approach provides autonomous robots with reliable localization capabilities even in challenging urban environments where GNSS fails.
From Haystack to Needle: Label Space Reduction for Zero-shot Classification
Vandemoortele, Nathan, Steenwinckel, Bram, Ongenae, Femke, Van Hoecke, Sofie
We present Label Space Reduction (LSR), a novel method for improving zero-shot classification performance of Large Language Models (LLMs). LSR iteratively refines the classification label space by systematically ranking and reducing candidate classes, enabling the model to concentrate on the most relevant options. By leveraging unlabeled data with the statistical learning capabilities of data-driven models, LSR dynamically optimizes the label space representation at test time. Our experiments across seven benchmarks demonstrate that LSR improves macro-F1 scores by an average of 7.0% (up to 14.2%) with Llama-3.1-70B and 3.3% (up to 11.1%) with Claude-3.5-Sonnet compared to standard zero-shot classification baselines. To reduce the computational overhead of LSR, which requires an additional LLM call at each iteration, we propose distilling the model into a probabilistic classifier, allowing for efficient inference.
Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation
Chen, Yu, Zhang, Xiangcheng, Wang, Siwei, Huang, Longbo
Reinforcement learning (RL) [43] has emerged as a powerful framework for sequential decision-making in dynamic and uncertain environments. While traditional RL methods, predominantly focused on maximizing the expected return, have seen significant advancements through approaches such as Q-learning [37, 25] and policy gradients [28, 10], they often fall short in real-world scenarios demanding strict risk control, such as financial investment [9], medical treatment [16], and automous driving [11]. The significance of comprehending risk management in RL has led to the emergence of Risk-Sensitive RL (RSRL). Unlike risk-neutral RL, which primarily focuses on maximizing expected returns, RSRL seeks to optimize risk metrics, such as entropy risk measures (ERM) [17, 18] or conditional value-at-risk (CVaR) [46], of the possible cumulative reward which emphasizes its distributional characteristics. However, traditional RL framework based on Q-learning which typically considers the mean of reward-to-go and corresponding Bellman equation, cannot efficiently capture the characteristics of the cumulative reward's distribution. Therefore, there has been an upsurge of interest in Distributional RL (DisRL) due to its capacity to understand the intrinsic distributional attributes of cumulative rewards, which has already achieved significant empirical success in risk-sensitive tasks [8, 14, 30, 45, 34].
Time-Optimal Path Planning in a Constant Wind for Uncrewed Aerial Vehicles using Dubins Set Classification
Moon, Brady, Sachdev, Sagar, Yuan, Junbin, Scherer, Sebastian
Time-optimal path planning in high winds for a turning-rate constrained UAV is a challenging problem to solve and is important for deployment and field operations. Previous works have used trochoidal path segments comprising straight and maximum-rate turn segments, as optimal extremal paths in uniform wind conditions. Current methods iterate over all candidate trochoidal trajectory types and select the one that is time-optimal; however, this exhaustive search can be computationally slow. In this paper, we introduce a method to decrease the computation time. This is achieved by reducing the number of candidate trochoidal trajectory types by framing the problem in the air-relative frame and bounding the solution within a subset of candidate trajectories. Our method reduces overall computation by 37.4% compared to pre-existing methods in Bang-Straight-Bang trajectories, freeing up computation for other onboard processes and can lead to significant total computational reductions when solving many trochoidal paths. When used within the framework of a global path planner, faster state expansions help find solutions faster or compute higher-quality paths. We also release our open-source codebase as a C++ package. The website and demo can be bound at https://bradymoon.com/trochoids, codebase at https://github.com/castacks/trochoids, and video at https://youtu.be/qOU5gI7JshI .
Dubins Curve Based Continuous-Curvature Trajectory Planning for Autonomous Mobile Robots
AMR is widely used in factories to replace manual labor to reduce costs and improve efficiency. However, it is often difficult for logistics robots to plan the optimal trajectory and unreasonable trajectory planning can lead to low transport efficiency and high energy consumption. In this paper, we propose a method to directly calculate the optimal trajectory for short distance on the basis of the Dubins set, which completes the calculation of the Dubins path. Additionally, as an improvement of Dubins path, we smooth the Dubins path based on clothoid, which makes the curvature varies linearly. AMR can adjust the steering wheels while following this trajectory. The experiments show that the Dubins path can be calculated quickly and well smoothed.
Adapting Learned Sparse Retrieval for Long Documents
Nguyen, Thong, MacAvaney, Sean, Yates, Andrew
Learned sparse retrieval (LSR) is a family of neural retrieval methods that transform queries and documents into sparse weight vectors aligned with a vocabulary. While LSR approaches like Splade work well for short passages, it is unclear how well they handle longer documents. We investigate existing aggregation approaches for adapting LSR to longer documents and find that proximal scoring is crucial for LSR to handle long documents. To leverage this property, we proposed two adaptations of the Sequential Dependence Model (SDM) to LSR: ExactSDM and SoftSDM. ExactSDM assumes only exact query term dependence, while SoftSDM uses potential functions that model the dependence of query terms and their expansion terms (i.e., terms identified using a transformer's masked language modeling head). Experiments on the MSMARCO Document and TREC Robust04 datasets demonstrate that both ExactSDM and SoftSDM outperform existing LSR aggregation approaches for different document length constraints. Surprisingly, SoftSDM does not provide any performance benefits over ExactSDM. This suggests that soft proximity matching is not necessary for modeling term dependence in LSR. Overall, this study provides insights into handling long documents with LSR, proposing adaptations that improve its performance.