trajectory length
Learning Shortest Paths with Generative Flow Networks
Morozov, Nikita, Maksimov, Ian, Tiapkin, Daniil, Samsonov, Sergey
In this paper, we present a novel learning framework for finding shortest paths in graphs utilizing Generative Flow Networks (GFlowNets). First, we examine theoretical properties of GFlowNets in non-acyclic environments in relation to shortest paths. We prove that, if the total flow is minimized, forward and backward policies traverse the environment graph exclusively along shortest paths between the initial and terminal states. Building on this result, we show that the pathfinding problem in an arbitrary graph can be solved by training a non-acyclic GFlowNet with flow regularization. We experimentally demonstrate the performance of our method in pathfinding in permutation environments and in solving Rubik's Cubes. For the latter problem, our approach shows competitive results with state-of-the-art machine learning approaches designed specifically for this task in terms of the solution length, while requiring smaller search budget at test-time.
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Asia > China > Ningxia Hui Autonomous Region > Yinchuan (0.04)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Surgery (0.93)
- Education (0.68)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Know your Trajectory -- Trustworthy Reinforcement Learning deployment through Importance-Based Trajectory Analysis
F, Clifford, Jay, Devika, Sarkar, Abhishek, Perepu, Satheesh K, S, Santhosh G, Dey, Kaushik, Ravindran, Balaraman
As Reinforcement Learning (RL) agents are increasingly deployed in real-world applications, ensuring their behavior is transparent and trustworthy is paramount. A key component of trust is explainability, yet much of the work in Explainable RL (XRL) focuses on local, single-step decisions. This paper addresses the critical need for explaining an agent's long-term behavior through trajectory-level analysis. We introduce a novel framework that ranks entire trajectories by defining and aggregating a new state-importance metric. This metric combines the classic Q-value difference with a "radical term" that captures the agent's affinity to reach its goal, providing a more nuanced measure of state criticality. We demonstrate that our method successfully identifies optimal trajectories from a heterogeneous collection of agent experiences. Furthermore, by generating counterfactual rollouts from critical states within these trajectories, we show that the agent's chosen path is robustly superior to alternatives, thereby providing a powerful "Why this, and not that?" explanation. Our experiments in standard OpenAI Gym environments validate that our proposed importance metric is more effective at identifying optimal behaviors compared to classic approaches, offering a significant step towards trustworthy autonomous systems.
Search at Scale: Improving Numerical Conditioning of Ergodic Coverage Optimization for Multi-Scale Domains
Lahrach, Yanis, Hughes, Christian, Abraham, Ian
Recent methods in ergodic coverage planning have shown promise as tools that can adapt to a wide range of geometric coverage problems with general constraints, but are highly sensitive to the numerical scaling of the problem space. The underlying challenge is that the optimization formulation becomes brittle and numerically unstable with changing scales, especially under potentially nonlinear constraints that impose dynamic restrictions, due to the kernel-based formulation. This paper proposes to address this problem via the development of a scale-agnostic and adaptive ergodic coverage optimization method based on the maximum mean discrepancy metric (MMD). Our approach allows the optimizer to solve for the scale of differential constraints while annealing the hyperparameters to best suit the problem domain and ensure physical consistency. We also derive a variation of the ergodic metric in the log space, providing additional numerical conditioning without loss of performance. We compare our approach with existing coverage planning methods and demonstrate the utility of our approach on a wide range of coverage problems.
- North America > United States > Connecticut > New Haven County > New Haven (0.04)
- Europe > Belgium > Wallonia > Walloon Brabant > Louvain-la-Neuve (0.04)
- Atlantic Ocean (0.04)
- Asia > Philippines (0.04)
REM: Evaluating LLM Embodied Spatial Reasoning through Multi-Frame Trajectories
Thompson, Jacob, Garcia-Lopez, Emiliano, Bisk, Yonatan
Humans build viewpoint-independent cognitive maps through navigation, enabling intuitive reasoning about object permanence and spatial relations. We argue that multimodal large language models (MLLMs), despite extensive video training, lack this fundamental spatial reasoning capability, a critical limitation for embodied applications. To demonstrate these limitations and drive research, we introduce REM (Reasoning over Embodied Multi-Frame Trajectories), a benchmark using controllable 3D environments for long-horizon embodied spatial reasoning. REM systematically evaluates key aspects like object permanence/distinction, spatial relationships, and numerical tracking across dynamic embodied viewpoints. Our evaluation shows that the best-performing current models exhibit promising overall performance, but become increasingly unreliable at even moderate complexity levels easily handled by humans. These findings highlight challenges MLLMs face in developing robust spatial representations from sequential visual input. Consequently, REM provides targeted metrics and diagnostics to foster improved spatial understanding in future models.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
Extendable Planning via Multiscale Diffusion
Chen, Chang, Hamed, Hany, Baek, Doojin, Kang, Taegu, Noh, Samyeul, Bengio, Yoshua, Ahn, Sungjin
Long-horizon planning is crucial in complex environments, but diffusion-based planners like Diffuser are limited by the trajectory lengths observed during training. This creates a dilemma: long trajectories are needed for effective planning, yet they degrade model performance. In this paper, we introduce this extendable long-horizon planning challenge and propose a two-phase solution. First, Progressive Trajectory Extension incrementally constructs longer trajectories through multi-round compositional stitching. Second, the Hierarchical Multiscale Diffuser enables efficient training and inference over long horizons by reasoning across temporal scales. To avoid the need for multiple separate models, we propose Adaptive Plan Pondering and the Recursive HM-Diffuser, which unify hierarchical planning within a single model. Experiments show our approach yields strong performance gains, advancing scalable and efficient decision-making over long-horizons.
- North America > United States > Montana (0.04)
- Europe > Italy > Liguria > Genoa (0.04)
SafeMIL: Learning Offline Safe Imitation Policy from Non-Preferred Trajectories
Burnwal, Returaj, Bhatt, Nirav Pravinbhai, Ravindran, Balaraman
In this work, we study the problem of offline safe imitation learning (IL). In many real-world settings, online interactions can be risky, and accurately specifying the reward and the safety cost information at each timestep can be difficult. However, it is often feasible to collect trajectories reflecting undesirable or risky behavior, implicitly conveying the behavior the agent should avoid. We refer to these trajectories as non-preferred trajectories. Unlike standard IL, which aims to mimic demonstrations, our agent must also learn to avoid risky behavior using non-preferred trajectories. In this paper, we propose a novel approach, SafeMIL, to learn a parame-terized cost that predicts if the state-action pair is risky via Multiple Instance Learning. The learned cost is then used to avoid non-preferred behaviors, resulting in a policy that prioritizes safety. We empirically demonstrate that our approach can learn a safer policy that satisfies cost constraints without degrading the reward performance, thereby outperforming several baselines.
- North America > United States (0.14)
- Asia > India (0.04)
Understanding Code Agent Behaviour: An Empirical Study of Success and Failure Trajectories
Majgaonkar, Oorja, Fei, Zhiwei, Li, Xiang, Sarro, Federica, Ye, He
The increasing deployment of Large Language Model (LLM) agents for complex software engineering tasks has created a need to understand their problem-solving behaviours beyond simple success metrics. While these agents demonstrate impressive capabilities in automated issue resolution, their decision-making processes remain largely opaque. This paper presents an empirical study of agent trajectories, namely the execution traces capturing the steps agents take when attempting to resolve software issues. We analyse trajectories from three state-of-the-art code agents (OpenHands, SWE-agent, and Prometheus) on the SWE-Bench benchmark, examining both successful and failed attempts. Our investigation reveals several key insights into agent behaviour. First, we identify how distinct problem-solving strategies, such as defensive programming and context gathering, enable success in different scenarios. Second, we find that failed trajectories are consistently longer and exhibit higher variance than successful ones, with failure patterns differing significantly between agents. Third, our fault localisation analysis shows that while most trajectories correctly identify problematic files (72-81\% even in failures), success depends more on achieving approximate rather than exact code modifications. These and other findings unveiled by our study, provide a foundation for understanding agent behaviour through trajectory analysis, contributing to the development of more robust and interpretable autonomous software engineering systems.
- Europe > United Kingdom > England > Greater London > London (0.41)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (6 more...)
- Research Report > Experimental Study (0.66)
- Research Report > New Finding (0.46)
WebGraphEval: Multi-Turn Trajectory Evaluation for Web Agents using Graph Representation
Qian, Yaoyao, Wang, Yuanli, Zhang, Jinda, Zong, Yun, Chen, Meixu, Zhou, Hanhan, Huang, Jindan, Zeng, Yifan, Hu, Xinyu, Song, Chan Hee, Zhang, Danqing
Current evaluation of web agents largely reduces to binary success metrics or conformity to a single reference trajectory, ignoring the structural diversity present in benchmark datasets. We present WebGraphEval, a framework that abstracts trajectories from multiple agents into a unified, weighted action graph. This representation is directly compatible with benchmarks such as WebArena, leveraging leaderboard runs and newly collected trajectories without modifying environments. The framework canonically encodes actions, merges recurring behaviors, and applies structural analyses including reward propagation and success-weighted edge statistics. Evaluations across thousands of trajectories from six web agents show that the graph abstraction captures cross-model regularities, highlights redundancy and inefficiency, and identifies critical decision points overlooked by outcome-based metrics. By framing web interaction as graph-structured data, WebGraphEval establishes a general methodology for multi-path, cross-agent, and efficiency-aware evaluation of web agents.
- North America > United States > Texas (0.04)
- North America > United States > Oregon (0.04)
- North America > United States > Ohio (0.04)
- North America > United States > Minnesota (0.04)