redundant action
TGPO: Tree-Guided Preference Optimization for Robust Web Agent Reinforcement Learning
Chen, Ziyuan, Zhao, Zhenghui, Han, Zhangye, Liu, Miancan, Ye, Xianhang, Li, Yiqing, Min, Hongbo, Ren, Jinkui, Zhang, Xiantao, Cao, Guitao
With the rapid advancement of large language models and vision-language models, employing large models as Web Agents has become essential for automated web interaction. However, training Web Agents with reinforcement learning faces critical challenges including credit assignment misallocation, prohibitively high annotation costs, and reward sparsity. To address these issues, we propose Tree-Guided Preference Optimization (TGPO), an offline reinforcement learning framework that proposes a tree-structured trajectory representation merging semantically identical states across trajectories to eliminate label conflicts. Our framework incorporates a Process Reward Model that automatically generates fine-grained rewards through subgoal progress, redundancy detection, and action verification. Additionally, a dynamic weighting mechanism prioritizes high-impact decision points during training. Experiments on Online-Mind2Web and our self-constructed C-WebShop datasets demonstrate that TGPO significantly outperforms existing methods, achieving higher success rates with fewer redundant steps.
VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots
Grigorev, Danil S., Kovalev, Alexey K., Panov, Aleksandr I.
In the field of robotics, researchers face a critical challenge in ensuring reliable and efficient task planning. Verifying high-level task plans before execution significantly reduces errors and enhance the overall performance of these systems. In this paper, we propose an architecture for automatically verifying high-level task plans before their execution in simulator or real-world environments. Leveraging Large Language Models (LLMs), our approach consists of two key steps: first, the conversion of natural language instructions into Linear Temporal Logic (LTL), followed by a comprehensive analysis of action sequences. The module uses the reasoning capabilities of the LLM to evaluate logical coherence and identify potential gaps in the plan. Rigorous testing on datasets of varying complexity demonstrates the broad applicability of the module to household tasks. We contribute to improving the reliability and efficiency of task planning and addresses the critical need for robust pre-execution verification in autonomous systems. The code is available at https://verifyllm.github.io.
Reducing Action Space for Deep Reinforcement Learning via Causal Effect Estimation
Liu, Wenzhang, Jin, Lianjun, Ren, Lu, Mu, Chaoxu, Sun, Changyin
Intelligent decision-making within large and redundant action spaces remains challenging in deep reinforcement learning. Considering similar but ineffective actions at each step can lead to repetitive and unproductive trials. Existing methods attempt to improve agent exploration by reducing or penalizing redundant actions, yet they fail to provide quantitative and reliable evidence to determine redundancy. In this paper, we propose a method to improve exploration efficiency by estimating the causal effects of actions. Unlike prior methods, our approach offers quantitative results regarding the causality of actions for one-step transitions. We first pre-train an inverse dynamics model to serve as prior knowledge of the environment. Subsequently, we classify actions across the entire action space at each time step and estimate the causal effect of each action to suppress redundant actions during exploration. We provide a theoretical analysis to demonstrate the effectiveness of our method and present empirical results from simulations in environments with redundant actions to evaluate its performance. Our implementation is available at https://github.com/agi-brain/cee.git.
No Prior Mask: Eliminate Redundant Action for Deep Reinforcement Learning
Zhong, Dianyu, Yang, Yiqin, Zhao, Qianchuan
The large action space is one fundamental obstacle to deploying Reinforcement Learning methods in the real world. The numerous redundant actions will cause the agents to make repeated or invalid attempts, even leading to task failure. Although current algorithms conduct some initial explorations for this issue, they either suffer from rule-based systems or depend on expert demonstrations, which significantly limits their applicability in many real-world settings. In this work, we examine the theoretical analysis of what action can be eliminated in policy optimization and propose a novel redundant action filtering mechanism. Unlike other works, our method constructs the similarity factor by estimating the distance between the state distributions, which requires no prior knowledge. In addition, we combine the modified inverse model to avoid extensive computation in high-dimensional state space. We reveal the underlying structure of action spaces and propose a simple yet efficient redundant action filtering mechanism named No Prior Mask (NPM) based on the above techniques. We show the superior performance of our method by conducting extensive experiments on high-dimensional, pixel-input, and stochastic problems with various action redundancy. Our code is public online at https://github.com/zhongdy15/npm.
Just-in-Time Backfilling in Multi-Agent Scheduling
Gallagher, Anthony (Carnegie Mellon University) | Hunsberger, Luke (Vassar College) | Smith, Stephen F. (Carnegie Mellon University)
This paper addresses the problem of how a group of agents cooperating on a complex plan with interdependent actions can coordinate their scheduling and execution of those actions, particularly in domains where actions may fail or have uncertain durations. If actions fail (or fail to meet their deadlines), the repercussions for the rest of the team's plan can be dramatic. This paper presents a pro-active strategy, called Just-in-Time Backfilling (JIT-BF), that agents can use to increase the fault tolerance of their interdependent schedules by identifying actions in danger of failing and inserting redundant (or back-up) actions into their schedules. The insertion of redundant actions can be done locally (i.e., by the agent whose action is in danger of failing) or through negotiations with the rest of the team. The computations performed by agents following the JIT-BF strategy depend on probabilistic models of action durations and the ``quality'' achieved by successfully executing actions. The paper presents an experimental evaluation of the JIT-BF strategy within a simulated real-time dynamic environment that demonstrates that teams using the pro-active JIT-BF strategy significantly out-perform teams that rely solely on reactive strategies.