planning graph
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Mixture of Structural-and-Textual Retrieval over Text-rich Graph Knowledge Bases
Lei, Yongjia, Han, Haoyu, Rossi, Ryan A., Dernoncourt, Franck, Lipka, Nedim, Halappanavar, Mahantesh M, Tang, Jiliang, Wang, Yu
Text-rich Graph Knowledge Bases (TG-KBs) have become increasingly crucial for answering queries by providing textual and structural knowledge. However, current retrieval methods often retrieve these two types of knowledge in isolation without considering their mutual reinforcement and some hybrid methods even bypass structural retrieval entirely after neighboring aggregation. To fill in this gap, we propose a Mixture of Structural-and-Textual Retrieval (MoR) to retrieve these two types of knowledge via a Planning-Reasoning-Organizing framework. In the Planning stage, MoR generates textual planning graphs delineating the logic for answering queries. Following planning graphs, in the Reasoning stage, MoR interweaves structural traversal and textual matching to obtain candidates from TG-KBs. In the Organizing stage, MoR further reranks fetched candidates based on their structural trajectory. Extensive experiments demonstrate the superiority of MoR in harmonizing structural and textual retrieval with insights, including uneven retrieving performance across different query logics and the benefits of integrating structural trajectories for candidate reranking. Our code is available at https://github.com/Yoega/MoR.
- Europe > Switzerland > Vaud > Lausanne (0.04)
- North America > United States > Oregon (0.04)
- North America > United States > Michigan (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.85)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.68)
- (2 more...)
Multi-Agent Transfer Learning via Temporal Contrastive Learning
Zeng, Weihao, Campbell, Joseph, Stepputtis, Simon, Sycara, Katia
This paper introduces a novel transfer learning framework for deep multi-agent reinforcement learning. The approach automatically combines goal-conditioned policies with temporal contrastive learning to discover meaningful sub-goals. The approach involves pre-training a goal-conditioned agent, finetuning it on the target domain, and using contrastive learning to construct a planning graph that guides the agent via sub-goals. Experiments on multi-agent coordination Overcooked tasks demonstrate improved sample efficiency, the ability to solve sparse-reward and long-horizon problems, and enhanced interpretability compared to baselines. The results highlight the effectiveness of integrating goal-conditioned policies with unsupervised temporal abstraction learning for complex multi-agent transfer learning. Compared to state-of-the-art baselines, our method achieves the same or better performances while requiring only 21.7% of the training samples.
On the Roles of LLMs in Planning: Embedding LLMs into Planning Graphs
Zhuo, Hankz Hankui, Chen, Xin, Pan, Rong
Plan synthesis aims to generate a course of actions or policies to transit given initial states to goal states, provided domain models that could be designed by experts or learnt from training data or interactions with the world. Intrigued by the claims of emergent planning capabilities in large language models (LLMs), works have been proposed to investigate the planning effectiveness of LLMs, without considering any utilization of off-the-shelf planning techniques in LLMs. In this paper, we aim to further study the insight of the planning capability of LLMs by investigating the roles of LLMs in off-the-shelf planning frameworks. To do this, we investigate the effectiveness of embedding LLMs into one of the well-known planning frameworks, graph-based planning, proposing a novel LLMs-based planning framework with LLMs embedded in two levels of planning graphs, i.e., mutual constraints generation level and constraints solving level. We empirically exhibit the effectiveness of our proposed framework in various planning domains.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (10 more...)
Continuous Neural Algorithmic Planners
He, Yu, Veličković, Petar, Liò, Pietro, Deac, Andreea
Neural algorithmic reasoning studies the problem of learning algorithms with neural networks, especially with graph architectures. A recent proposal, XLVIN, reaps the benefits of using a graph neural network that simulates the value iteration algorithm in deep reinforcement learning agents. It allows model-free planning without access to privileged information about the environment, which is usually unavailable. However, XLVIN only supports discrete action spaces, and is hence nontrivially applicable to most tasks of real-world interest. We expand XLVIN to continuous action spaces by discretization, and evaluate several selective expansion policies to deal with the large planning graphs. Our proposal, CNAP, demonstrates how neural algorithmic reasoning can make a measurable impact in higher-dimensional continuous control settings, such as MuJoCo, bringing gains in low-data settings and outperforming model-free baselines.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada > Quebec > Montreal (0.04)
Do
Many real world planning problems require goals with deadlines anddurative actions that consume resources. In this paper, we present Sapa, a domain-independent heuristic forward chaining planner thatcan handle durative actions, metric resource constraints, and deadlinegoals. The main innovation of Sapa is the set of distance basedheuristics it employs to control its search. We consider bothoptimizing and satisficing search. For the former, we identifyadmissible heuristics for objective functions based on makespan andslack. For satisficing search, our heuristics are aimed at scalabilitywith reasonable plan quality. Our heuristics are derived from the relaxed temporal planning graph'' structure, which is ageneralization of planning graphs to temporal domains. We also providetechniques for adjusting the heuristic values to account for resourceconstraints. Our experimental results indicate that Sapa returnsgood quality solutions for complex planning problems in reasonabletime.
Disentangled Planning and Control in Vision Based Robotics via Reward Machines
Camacho, Alberto, Varley, Jacob, Jain, Deepali, Iscen, Atil, Kalashnikov, Dmitry
In this work we augment a Deep Q-Learning agent with a Reward Machine (DQRM) to increase speed of learning vision-based policies for robot tasks, and overcome some of the limitations of DQN that prevent it from converging to good-quality policies. A reward machine (RM) is a finite state machine that decomposes a task into a discrete planning graph and equips the agent with a reward function to guide it toward task completion. The reward machine can be used for both reward shaping, and informing the policy what abstract state it is currently at. An abstract state is a high level simplification of the current state, defined in terms of task relevant features. These two supervisory signals of reward shaping and knowledge of current abstract state coming from the reward machine complement each other and can both be used to improve policy performance as demonstrated on several vision based robotic pick and place tasks. Particularly for vision based robotics applications, it is often easier to build a reward machine than to try and get a policy to learn the task without this structure.
Adjust Planning Strategies to Accommodate Reinforcement Learning Agents
The solution of many continuous decision problem can be described as such a process: agent set out from the initial state, then go through a series of intermediate state and finally reach the goal state. Imagine an agent in a maze, which needs to find some key positions and pass through them one by one to get out. Agent has two types of behavior: one is the micro action taken at every state, which is similar to muscle activity, called reaction; another is the change of trend in reactions taken over a period of time, which is similar to thought of human, called planning [15]. For the agent in maze, reaction can be its every little moving step and planning can be its every determination of the position it should reach next. In a complicated scene with high-dimensional data stream, long-term decision process and sparse supervision signal, an agent trained only to react [9, 10] can hardly perform well (See Appendix A for demonstration). However, combining reaction and planning [3, 4, 14] can significantly improve its capability. The essence of such improvement is that agent has limited reaction capability and the introduction of planning releases agent from reacting in the whole task.
Recent Advances in AI Planning
Although researchers have studied planning since the early days of AI, recent developments have revolutionized the field. Furthermore, work on propositional planning is closely related to the algorithms used in the autonomous controller for the National Aeronautics and Space Administration (NASA) Deep Space One spacecraft, launched in October 1998. As a result, our understanding of interleaved planning and execution has advanced as well as the speed with which we can solve classical planning problems. The goal of this survey is to explain these recent advances and suggest new directions for research. Because this article requires minimal AI background (for example, simple logic and basic search algorithms), it's suitable for a wide audience, but my treatment is not exhaustive because I don't have the space to discuss every active topic of planning research.