AITopics | guidance policy

Country: Asia > China > Heilongjiang Province > Harbin (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Neural Information Processing SystemsDec-25-2025, 00:36:52 GMT

CUP: Critic-Guided Policy Reuse

The ability to reuse previous policies is an important aspect of human intelligence. To achieve efficient policy reuse, a Deep Reinforcement Learning (DRL) agent needs to decide when to reuse and which source policies to reuse. Previous methods solve this problem by introducing extra components to the underlying algorithm, such as hierarchical high-level policies over source policies, or estimations of source policies' value functions on the target task. However, training these components induces either optimization non-stationarity or heavy sampling cost, significantly impairing the effectiveness of transfer. To tackle this problem, we propose a novel policy reuse algorithm called Critic-gUided Policy reuse (CUP), which avoids training any extra components and efficiently reuses source policies.

critic-guided policy reuse, name change, source policy, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Neural Information Processing SystemsAug-22-2025, 01:18:47 GMT

Checklist 1. For all authors (a)

Do the main claims made in the abstract and introduction accurately reflect the paper's Did you discuss any potential negative societal impacts of your work? Did you state the full set of assumptions of all theoretical results? Did you include complete proofs of all theoretical results? Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] See the Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? Did you include the total amount of compute and the type of resources used (e.g., type Did you include any new assets either in the supplemental material or as a URL? [N/A] Did you discuss whether and how consent was obtained from people whose data you're If you used crowdsourcing or conducted research with human subjects... (a) We believe policy reuse serves as a promising way to transfer knowledge among AI agents.

artificial intelligence, source policy, tar, (16 more...)

Genre: Research Report (0.46)

Industry: Social Sector (0.34)

Technology: Information Technology > Artificial Intelligence (0.48)

Neural Information Processing SystemsAug-22-2025, 01:18:43 GMT

CUP: Critic-Guided Policy Reuse

The ability to reuse previous policies is an important aspect of human intelligence.

machine learning, reinforcement learning, source policy, (12 more...)

Country: Asia > China > Heilongjiang Province > Harbin (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Neural Information Processing SystemsJan-18-2025, 13:55:49 GMT

CUP: Critic-Guided Policy Reuse

The ability to reuse previous policies is an important aspect of human intelligence. To achieve efficient policy reuse, a Deep Reinforcement Learning (DRL) agent needs to decide when to reuse and which source policies to reuse. Previous methods solve this problem by introducing extra components to the underlying algorithm, such as hierarchical high-level policies over source policies, or estimations of source policies' value functions on the target task. However, training these components induces either optimization non-stationarity or heavy sampling cost, significantly impairing the effectiveness of transfer. To tackle this problem, we propose a novel policy reuse algorithm called Critic-gUided Policy reuse (CUP), which avoids training any extra components and efficiently reuses source policies.

critic-guided policy reuse, guidance policy, source policy, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.62)

arXiv.org Artificial IntelligenceJan-4-2025

Online Guidance Graph Optimization for Lifelong Multi-Agent Path Finding

Zang, Hongzhi, Zhang, Yulun, Jiang, He, Chen, Zhe, Harabor, Daniel, Stuckey, Peter J., Li, Jiaoyang

We study the problem of optimizing a guidance policy capable of dynamically guiding the agents for lifelong Multi-Agent Path Finding based on real-time traffic patterns. Multi-Agent Path Finding (MAPF) focuses on moving multiple agents from their starts to goals without collisions. Its lifelong variant, LMAPF, continuously assigns new goals to agents. In this work, we focus on improving the solution quality of PIBT, a state-of-the-art rule-based LMAPF algorithm, by optimizing a policy to generate adaptive guidance. We design two pipelines to incorporate guidance in PIBT in two different ways. We demonstrate the superiority of the optimized policy over both static guidance and human-designed policies. Additionally, we explore scenarios where task distribution changes over time, a challenging yet common situation in real-world applications that is rarely explored in the literature.

agent, guidance graph, guidance policy, (14 more...)

2411.16506

Country:

North America > United States > California (0.14)
Oceania > Australia (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Industry: Transportation (0.48)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

arXiv.org Artificial IntelligenceAug-14-2023

IOB: Integrating Optimization Transfer and Behavior Transfer for Multi-Policy Reuse

Li, Siyuan, Li, Hao, Zhang, Jin, Wang, Zhen, Liu, Peng, Zhang, Chongjie

Humans have the ability to reuse previously learned policies to solve new tasks quickly, and reinforcement learning (RL) agents can do the same by transferring knowledge from source policies to a related target task. Transfer RL methods can reshape the policy optimization objective (optimization transfer) or influence the behavior policy (behavior transfer) using source policies. However, selecting the appropriate source policy with limited samples to guide target policy learning has been a challenge. Previous methods introduce additional components, such as hierarchical policies or estimations of source policies' value functions, which can lead to non-stationary policy optimization or heavy sampling costs, diminishing transfer effectiveness. To address this challenge, we propose a novel transfer RL method that selects the source policy without training extra components. Our method utilizes the Q function in the actor-critic framework to guide policy selection, choosing the source policy with the largest one-step improvement over the current target policy. We integrate optimization transfer and behavior transfer (IOB) by regularizing the learned policy to mimic the guidance policy and combining them as the behavior policy. This integration significantly enhances transfer effectiveness, surpasses state-of-the-art transfer RL baselines in benchmark tasks, and improves final performance and knowledge transferability in continual learning scenarios. Additionally, we show that our optimization transfer technique is guaranteed to improve target policy learning.

machine learning, reinforcement learning, source policy, (15 more...)

2308.07351

Country:

Asia > China > Heilongjiang Province > Harbin (0.04)
North America > United States (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceOct-14-2022

CUP: Critic-Guided Policy Reuse

Zhang, Jin, Li, Siyuan, Zhang, Chongjie

The ability to reuse previous policies is an important aspect of human intelligence. To achieve efficient policy reuse, a Deep Reinforcement Learning (DRL) agent needs to decide when to reuse and which source policies to reuse. Previous methods solve this problem by introducing extra components to the underlying algorithm, such as hierarchical high-level policies over source policies, or estimations of source policies' value functions on the target task. However, training these components induces either optimization non-stationarity or heavy sampling cost, significantly impairing the effectiveness of transfer. To tackle this problem, we propose a novel policy reuse algorithm called Critic-gUided Policy reuse (CUP), which avoids training any extra components and efficiently reuses source policies. CUP utilizes the critic, a common component in actor-critic methods, to evaluate and choose source policies. At each state, CUP chooses the source policy that has the largest one-step improvement over the current target policy, and forms a guidance policy. The guidance policy is theoretically guaranteed to be a monotonic improvement over the current target policy. Then the target policy is regularized to imitate the guidance policy to perform efficient policy search. Empirical results demonstrate that CUP achieves efficient transfer and significantly outperforms baseline algorithms.

machine learning, reinforcement learning, source policy, (13 more...)

2210.08153

Country: Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

Tsukamoto, Hiroyasu, Chung, Soon-Jo, Donitz, Benjamin, Ingham, Michel, Mages, Declan, Nakka, Yashwanth Kumar

Neural-Rendezvous: Learning-based Robust Guidance and Control to Encounter Interstellar Objects

arXiv.org Artificial IntelligenceAug-9-2022

Interstellar objects (ISOs), astronomical objects not gravitationally bound to the Sun, are likely representatives of primitive materials invaluable in understanding exoplanetary star systems. Due to their poorly constrained orbits with generally high inclinations and relative velocities, however, exploring ISOs with conventional human-in-the-loop approaches is significantly challenging. This paper presents Neural-Rendezvous -- a deep learning-based guidance and control framework for encountering any fast-moving objects, including ISOs, robustly, accurately, and autonomously in real-time. It uses pointwise minimum norm tracking control on top of a guidance policy modeled by a spectrally-normalized deep neural network, where its hyperparameters are tuned with a newly introduced loss function directly penalizing the state trajectory tracking error. We rigorously show that, even in the challenging case of ISO exploration, Neural-Rendezvous provides 1) a high probability exponential bound on the expected spacecraft delivery error; and 2) a finite optimality gap with respect to the solution of model predictive control, both of which are indispensable especially for such a critical space mission. In numerical simulations, Neural-Rendezvous is demonstrated to achieve a terminal-time delivery error of less than 0.2 km for 99% of the ISO candidates with realistic state uncertainty, whilst retaining computational efficiency sufficient for real-time implementation.

artificial intelligence, machine learning, trajectory, (19 more...)