Planning & Scheduling
Theoretical Foundations for Semantic Cognition in Artificial Intelligence
This monograph presents a modular cognitive architecture for artificial intelligence grounded in the formal modeling of belief as structured semantic state. Belief states are defined as dynamic ensembles of linguistic expressions embedded within a navigable manifold, where operators enable assimilation, abstraction, nullification, memory, and introspection. Drawing from philosophy, cognitive science, and neuroscience, we develop a layered framework that enables self-regulating epistemic agents capable of reflective, goal-directed thought. At the core of this framework is the epistemic vacuum: a class of semantically inert cognitive states that serves as the conceptual origin of belief space. From this foundation, the Null Tower arises as a generative structure recursively built through internal representational capacities. The theoretical constructs are designed to be implementable in both symbolic and neural systems, including large language models, hybrid agents, and adaptive memory architectures. This work offers a foundational substrate for constructing agents that reason, remember, and regulate their beliefs in structured, interpretable ways.
FREESON: Retriever-Free Retrieval-Augmented Reasoning via Corpus-Traversing MCTS
Large Reasoning Models (LRMs) have demonstrated remarkable capabilities in multi-step reasoning and calling search engines at appropriate steps. However, existing retrieval-augmented reasoning approaches rely on separate retrieval models, limiting the LRM's role in retrieval to deciding when to retrieve and how to query. This separation not only increases hardware and operational costs but also leads to errors in the retrieval process due to the representation bottleneck, a phenomenon where the retriever's embedding space is not expressive enough to meet the generator's requirements. To address this, we shift our perspective from sequence-to-sequence matching to locating the answer-containing paths within the corpus, and propose a novel framework called FREESON (Retriever-FREE Retrieval-Augmented ReaSONing). This framework enables LRMs to retrieve relevant knowledge on their own by acting as both a generator and retriever. To achieve this, we introduce a variant of the MCTS algorithm specialized for the retrieval task, which we call CT-MCTS (Corpus-Traversing Monte Carlo Tree Search). In this algorithm, LRMs traverse through the corpus toward answer-containing regions. Our results on five open-domain QA benchmarks, including single-hop and multi-hop questions, show that FREESON achieves an average improvement of 14.4% in EM and F1 over four multi-step reasoning models with a separate retriever, and it also performs comparably to the strongest baseline, surpassing it by 3% on PopQA and 2WikiMultihopQA.
Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning
Santos, Pedro P., Sardinha, Alberto, Melo, Francisco S.
In this work, we contribute the first approach to solve infinite-horizon discounted general-utility Markov decision processes (GUMDPs) in the single-trial regime, i.e., when the agent's performance is evaluated based on a single trajectory. First, we provide some fundamental results regarding policy optimization in the single-trial regime, investigating which class of policies suffices for optimality, casting our problem as a particular MDP that is equivalent to our original problem, as well as studying the computational hardness of policy optimization in the single-trial regime. Second, we show how we can leverage online planning techniques, in particular a Monte-Carlo tree search algorithm, to solve GUMDPs in the single-trial regime. Third, we provide experimental results showcasing the superior performance of our approach in comparison to relevant baselines.
Cascaded Diffusion Models for Neural Motion Planning
Sharma, Mohit, Fishman, Adam, Kumar, Vikash, Paxton, Chris, Kroemer, Oliver
-- Robots in the real world need to perceive and move to goals in complex environments without collisions. A voiding collisions is especially difficult when relying on sensor perception and when goals are among clutter . Diffusion policies and other generative models have shown strong performance in solving local planning problems, but often struggle at avoiding all of the subtle constraint violations that characterize truly challenging global motion planning problems. In this work, we propose an approach for learning global motion planning using diffusion policies, allowing the robot to generate full trajectories through complex scenes and reasoning about multiple obstacles along the path. Our approach uses cascaded hierarchical models which unify global prediction and local refinement together with online plan repair to ensure the trajectories are collision free. Our method outperforms ( 5%) a wide variety of baselines on challenging tasks in multiple domains including navigation and manipulation. A key requirement for useful robots is that they can generalize motions to new environments. While classical motion planning algorithms often show good generalization [1], they require privileged information (e.g., full scene geometry) about their world; this has led to interest in neural motion planning approaches which can operate off of raw sensor data [2], [3], [4], [5], [6], and leverage large-scale behavior cloning to guide sampling [7], [2], [3]. However, neural motion planning approaches often struggle at generalizing to the challenging, cluttered environments in which traditional motion planners excel. This limitation is because learned approaches fail to satisfy all of the many constraints necessary for a trajectory to be successful for a high-dimensional multi-modal planning problem.
Histo-Planner: A Real-time Local Planner for MAVs Teleoperation based on Histogram of Obstacle Distribution
Wang, Ze, Gao, Zhenyu, Qu, Jingang, Morin, Pascal
Motivated by teleoperation applications in cluttered environments with limited computational power, we propose a local planner that does not require the knowledge or construction of a global map of the obstacles. The proposed solution consists of a real-time trajectory planning algorithm that relies on the histogram of obstacle distribution and a planner manager that triggers different planning modes depending on obstacles location around the MA V . The proposed solution is validated, for a teleoperation application, with both simulations and indoor experiments. Benchmark comparisons based on a designed simulation platform are also provided. I. INTRODUCTION Micro aerial vehicles (MA Vs) are used in many applications, such as rescue search, forestry monitoring, infrastructure maintenance, aerial photography, etc. When the MA V operates in cluttered environments, obstacle avoidance is a major problem. Solutions to this problem are highly dependent on the type of environment, the available onboard sensors, the availability of a global map of the environment, and the available computational power. While solutions to this problem rely on both perception and planning/navigation aspects (the classical sense and avoid scenario), the present paper focuses on the navigation aspect. Many traditional navigation methods are summarized in detail in [1].
Adaptive Bias Generalized Rollout Policy Adaptation on the Flexible Job-Shop Scheduling Problem
Kobrosly, Lotfi, Graviers, Marc-Emmanuel Coupvent des, Guettier, Christophe, Cazenave, Tristan
The Flexible Job-Shop Scheduling Problem (FJSSP) is an NP-hard combinatorial optimization problem, with several application domains, especially for manufacturing purposes. The objective is to efficiently schedule multiple operations on dissimilar machines. These operations are gathered into jobs, and operations pertaining to the same job need to be scheduled sequentially. Different methods have been previously tested to solve this problem, such as Constraint Solving, Tabu Search, Genetic Algorithms, or Monte Carlo Tree Search (MCTS). We propose a novel algorithm derived from the Generalized Nested Rollout Policy Adaptation, developed to solve the FJSSP. We report encouraging experimental results, as our algorithm performs better than other MCTS-based approaches, even if makespans obtained on large instances are still far from known upper bounds.
GRAML: Goal Recognition As Metric Learning
Goal Recognition (GR) is the problem of recognizing an agent's objectives based on observed actions. Recent data-driven approaches for GR alleviate the need for costly, manually crafted domain models. However, these approaches can only reason about a pre-defined set of goals, and time-consuming training is needed for new emerging goals. To keep this model-learning automated while enabling quick adaptation to new goals, this paper introduces GRAML: Goal Recognition As Metric Learning. GRAML uses a Siamese network to treat GR as a deep metric learning task, employing an RNN that learns a metric over an embedding space, where the embeddings for observation traces leading to different goals are distant, and embeddings of traces leading to the same goals are close. This metric is especially useful when adapting to new goals, even if given just one example observation trace per goal. Evaluated on a versatile set of environments, GRAML shows speed, flexibility, and runtime improvements over the state-of-the-art GR while maintaining accurate recognition.
Building a Stable Planner: An Extended Finite State Machine Based Planning Module for Mobile GUI Agent
Mo, Fanglin, Chen, Junzhe, Zhu, Haoxuan, Hu, Xuming
Mobile GUI agents execute user commands by directly interacting with the graphical user interface (GUI) of mobile devices, demonstrating significant potential to enhance user convenience. However, these agents face considerable challenges in task planning, as they must continuously analyze the GUI and generate operation instructions step by step. This process often leads to difficulties in making accurate task plans, as GUI agents lack a deep understanding of how to effectively use the target applications, which can cause them to become "lost" during task execution. To address the task planning issue, we propose SPlanner, a plug-and-play planning module to generate execution plans that guide vision language model(VLMs) in executing tasks. The proposed planning module utilizes extended finite state machines (EFSMs) to model the control logits and configurations of mobile applications. It then decomposes a user instruction into a sequence of primary function modeled in EFSMs, and generate the execution path by traversing the EFSMs. We further refine the execution path into a natural language plan using an LLM. The final plan is concise and actionable, and effectively guides VLMs to generate interactive GUI actions to accomplish user tasks. SPlanner demonstrates strong performance on dynamic benchmarks reflecting real-world mobile usage. On the AndroidWorld benchmark, SPlanner achieves a 63.8% task success rate when paired with Qwen2.5-VL-72B as the VLM executor, yielding a 28.8 percentage point improvement compared to using Qwen2.5-VL-72B without planning assistance.
ChatHTN: Interleaving Approximate (LLM) and Symbolic HTN Planning
Munoz-Avila, Hector, Aha, David W., Rizzo, Paola
We introduce ChatHTN, a Hierarchical Task Network (HTN) planner that combines symbolic HTN planning techniques with queries to ChatGPT to approximate solutions in the form of task decompositions. The resulting hierarchies interleave task decompositions generated by symbolic HTN planning with those generated by ChatGPT. Despite the approximate nature of the results generates by ChatGPT, ChatHTN is provably sound; any plan it generates correctly achieves the input tasks. We demonstrate this property with an open-source implementation of our system.
Exploiting Symbolic Heuristics for the Synthesis of Domain-Specific Temporal Planning Guidance using Reinforcement Learning
Brugnara, Irene, Valentini, Alessandro, Micheli, Andrea
Recent work investigated the use of Reinforcement Learning (RL) for the synthesis of heuristic guidance to improve the performance of temporal planners when a domain is fixed and a set of training problems (not plans) is given. The idea is to extract a heuristic from the value function of a particular (possibly infinite-state) MDP constructed over the training problems. In this paper, we propose an evolution of this learning and planning framework that focuses on exploiting the information provided by symbolic heuristics during both the RL and planning phases. First, we formalize different reward schemata for the synthesis and use symbolic heuristics to mitigate the problems caused by the truncation of episodes needed to deal with the potentially infinite MDP . Second, we propose learning a residual of an existing symbolic heuristic, which is a "correction" of the heuristic value, instead of eagerly learning the whole heuristic from scratch. Finally, we use the learned heuristic in combination with a symbolic heuristic using a multiple-queue planning approach to balance systematic search with imperfect learned information. We experimentally compare all the approaches, highlighting their strengths and weaknesses and significantly advancing the state of the art for this planning and learning schema.