Goto

Collaborating Authors

 Planning & Scheduling


Bayes-Adaptive Simulation-based Search with Value Function Approximation Arthur Guez,1,2 Nicolas Heess 2 David Silver 2 Peter Dayan

Neural Information Processing Systems

Bayes-adaptive planning offers a principled solution to the explorationexploitation trade-off under model uncertainty. It finds the optimal policy in belief space, which explicitly accounts for the expected effect on future rewards of reductions in uncertainty. However, the Bayes-adaptive solution is typically intractable in domains with large or continuous state spaces. We present a tractable method for approximating the Bayes-adaptive solution by combining simulationbased search with a novel value function approximation technique that generalises appropriately over belief space. Our method outperforms prior approaches in both discrete bandit tasks and simple continuous navigation and control tasks.


Attractor Network Dynamics Enable Preplay and Rapid Path Planning in Maze-like Environments

Neural Information Processing Systems

Rodents navigating in a well-known environment can rapidly learn and revisit observed reward locations, often after a single trial. While the mechanism for rapid path planning is unknown, the CA3 region in the hippocampus plays an important role, and emerging evidence suggests that place cell activity during hippocampal "preplay" periods may trace out future goal-directed trajectories. Here, we show how a particular mapping of space allows for the immediate generation of trajectories between arbitrary start and goal locations in an environment, based only on the mapped representation of the goal. We show that this representation can be implemented in a neural attractor network model, resulting in bump-like activity profiles resembling those of the CA3 region of hippocampus. Neurons tend to locally excite neurons with similar place field centers, while inhibiting other neurons with distant place field centers, such that stable bumps of activity can form at arbitrary locations in the environment. The network is initialized to represent a point in the environment, then weakly stimulated with an input corresponding to an arbitrary goal location. We show that the resulting activity can be interpreted as a gradient ascent on the value function induced by a reward at the goal location. Indeed, in networks with large place fields, we show that the network properties cause the bump to move smoothly from its initial location to the goal, around obstacles or walls. Our results illustrate that an attractor network with hippocampal-like attributes may be important for rapid path planning.


APACE: Agile and Perception-Aware Trajectory Generation for Quadrotor Flights

arXiv.org Artificial Intelligence

Various perception-aware planning approaches have attempted to enhance the state estimation accuracy during maneuvers, while the feature matchability among frames, a crucial factor influencing estimation accuracy, has often been overlooked. In this paper, we present APACE, an Agile and Perception-Aware trajeCtory gEneration framework for quadrotors aggressive flight, that takes into account feature matchability during trajectory planning. We seek to generate a perception-aware trajectory that reduces the error of visual-based estimator while satisfying the constraints on smoothness, safety, agility and the quadrotor dynamics. The perception objective is achieved by maximizing the number of covisible features while ensuring small enough parallax angles. Additionally, we propose a differentiable and accurate visibility model that allows decomposition of the trajectory planning problem for efficient optimization resolution. Through validations conducted in both a photorealistic simulator and real-world experiments, we demonstrate that the trajectories generated by our method significantly improve state estimation accuracy, with root mean square error (RMSE) reduced by up to an order of magnitude. The source code will be released to benefit the community.


Meta-operators for Enabling Parallel Planning Using Deep Reinforcement Learning

arXiv.org Artificial Intelligence

There is a growing interest in the application of Reinforcement Learning (RL) techniques to AI planning with the aim to come up with general policies. Typically, the mapping of the transition model of AI planning to the state transition system of a Markov Decision Process is established by assuming a one-to-one correspondence of the respective action spaces. In this paper, we introduce the concept of meta-operator as the result of simultaneously applying multiple planning operators, and we show that including meta-operators in the RL action space enables new planning perspectives to be addressed using RL, such as parallel planning. Our research aims to analyze the performance and complexity of including meta-operators in the RL process, concretely in domains where satisfactory outcomes have not been previously achieved using usual generalized planning models. The main objective of this article is thus to pave the way towards a redefinition of the RL action space in a manner that is more closely aligned with the planning perspective.


Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning Jean-Bastien Grill Michal Valko Rรฉmi Munos SequeL team, INRIA Lille - Nord Europe, France Google DeepMind, UK

Neural Information Processing Systems

You are a robot and you live in a Markov decision process (MDP) with a finite or an infinite number of transitions from state-action to next states. You got brains and so you plan before you act. Luckily, your roboparents equipped you with a generative model to do some Monte-Carlo planning. The world is waiting for you and you have no time to waste. You want your planning to be efficient.


Effective Underwater Glider Path Planning in Dynamic 3D Environments Using Multi-Point Potential Fields

arXiv.org Artificial Intelligence

Underwater gliders (UGs) have emerged as highly effective unmanned vehicles for ocean exploration. However, their operation in dynamic and complex underwater environments necessitates robust path-planning strategies. Previous studies have primarily focused on global energy or time-efficient path planning in explored environments, overlooking challenges posed by unpredictable flow conditions and unknown obstacles in varying and dynamic areas like fjords and near-harbor waters. This paper introduces and improves a real-time path planning method, Multi-Point Potential Field (MPPF), tailored for UGs operating in 3D space as they are constrained by buoyancy propulsion and internal actuation. The proposed MPPF method addresses obstacles, flow fields, and local minima, enhancing the efficiency and robustness of UG path planning. A low-cost prototype, the Research Oriented Underwater Glider for Hands-on Investigative Engineering (ROUGHIE), is utilized for validation. Through case studies and simulations, the efficacy of the enhanced MPPF method is demonstrated, highlighting its potential for real-world applications in underwater exploration.


PROSKILL: A formal skill language for acting in robotics

arXiv.org Artificial Intelligence

Acting is an important decisional function to ensure proper deliberation on an autonomous system (Ingrand and Ghallab, 2017). It often sits between planning and the platform, but unlike planning it is an online process, which must stay reactive to the dynamic of the environment and the platform and cannot devote resources to long computations and complex searches. Acting often relies on models, called skills, which specify how to perform actions (as an operational model), while the action models used for planning are more what is abstractly needed to perform the action (as a descriptive model) (Ghallab et al., 2016). The most basic skills need to connect to the commands made available by the functional level to the acting component, call them asynchronously, get execution status and result, but it also needs means to receive exogenous events as they occur in the environment. This action/command dispatching may also rely on preconditions and invariants checking, interruptions, temporal constraints, etc. Above the basic skills one often finds more complex skills, similar to programs with control structures to allow for local choices and local recoveries with test, branching, looping, parallel and asynchronous execution. Considering the expected functionalities of an acting component, its skill language/framework should provide the following features: Support for Validation and Verification (V&V). Notwithstanding the other functionalities, this is the feature the work presented in this paper focuses on. One cannot only rely on basic skills connecting to the robot commands, one also needs some programming primitives (e.g., test, branching, loop). 1


Online Adaptation of Sampling-Based Motion Planning with Inaccurate Models

arXiv.org Artificial Intelligence

Robotic manipulation relies on analytical or learned models to simulate the system dynamics. These models are often inaccurate and based on offline information, so that the robot planner is unable to cope with mismatches between the expected and the actual behavior of the system (e.g., the presence of an unexpected obstacle). In these situations, the robot should use information gathered online to correct its planning strategy and adapt to the actual system response. We propose a sampling-based motion planning approach that uses an estimate of the model error and online observations to correct the planning strategy at each new replanning. Our approach adapts the cost function and the sampling bias of a kinodynamic motion planner when the outcome of the executed transitions is different from the expected one (e.g., when the robot unexpectedly collides with an obstacle) so that future trajectories will avoid unreliable motions. To infer the properties of a new transition, we introduce the notion of context-awareness, i.e., we store local environment information for each executed transition and avoid new transitions with context similar to previous unreliable ones. This is helpful for leveraging online information even if the simulated transitions are far (in the state-and-action space) from the executed ones. Simulation and experimental results show that the proposed approach increases the success rate in execution and reduces the number of replannings needed to reach the goal.


Synchronized Dual-arm Rearrangement via Cooperative mTSP

arXiv.org Artificial Intelligence

Synchronized dual-arm rearrangement is widely studied as a common scenario in industrial applications. It often faces scalability challenges due to the computational complexity of robotic arm rearrangement and the high-dimensional nature of dual-arm planning. To address these challenges, we formulated the problem as cooperative mTSP, a variant of mTSP where agents share cooperative costs, and utilized reinforcement learning for its solution. Our approach involved representing rearrangement tasks using a task state graph that captured spatial relationships and a cooperative cost matrix that provided details about action costs. Taking these representations as observations, we designed an attention-based network to effectively combine them and provide rational task scheduling. Furthermore, a cost predictor is also introduced to directly evaluate actions during both training and planning, significantly expediting the planning process. Our experimental results demonstrate that our approach outperforms existing methods in terms of both performance and planning efficiency.


Relevance Score: A Landmark-Like Heuristic for Planning

arXiv.org Artificial Intelligence

The use of heuristics to guide search and limit the search space is an important component of modern planning systems. There is a well-established literature of methods that use heuristics to improve the computational efficiency of computing plans. The existence of a sound heuristic that can be computed quickly makes path planning in Euclidean space much more efficient than the more abstract search spaces used for task planning. One of the successful heuristics used for task planning is a count of the "landmarks" that remain to be reached from a given state Zhu and Givan (2003); Hoffmann et al. (2004); Richter and Westphal (2010); Keyder et al. (2010), where a landmark is a fact, action, or a logical formula over facts and/or actions, that is present in all valid solutions (i.e., sequence of actions) for a planning problem. This leads to a preference for actions (in plans) that are a landmark or achieve one.