"Planning is the process of generating (possibly partial) representations of future behavior prior to the use of such plans to constrain or control that behavior. The outcome is usually a set of actions, with temporal and other constraints on them, for execution by some agent or agents. As a core aspect of human intelligence, planning has been studied since the earliest days of AI and cognitive science. Planning research has led to many useful tools for real-world applications, and has yielded significant insights into the organization of behavior and the nature of reasoning about actions."
– Planning entry by Austin Tate in the MIT Encyclopedia of Cognitive Science.
Explainable planning is widely accepted as a prerequisite for autonomous agents to successfully work with humans. While there has been a lot of research on generating explanations of solutions to planning problems, explaining the absence of solutions remains an open and under-studied problem, even though such situations can be the hardest to understand or debug. In this paper, we show that hierarchical abstractions can be used to efficiently generate reasons for unsolvability of planning problems. In contrast to related work on computing certificates of unsolvability, we show that these methods can generate compact, human-understandable reasons for unsolvability. Empirical analysis and user studies show the validity of our methods as well as their computational efficacy on a number of benchmark planning domains.
Our work proposes to evidence this promise when applied to the built environment. Specifically, we offer to apply AI to floor plans analysis and generation. Our methodology follows two main intuitions (1) the creation of building plans is a non-trivial technical challenge, although encompassing standard optimization technics, and (2) the design of space is a sequential process, requiring successive design steps across different scales (urban scale, building scale, unit scale). Then, in order to harness these two realities, we have chosen nested Generative Adversarial Neural Networks or GANs. Such models enable us to capture more complexity across encountered floor plans and to break down the complexity by tackling problems through successive steps.
We propose a framework for learning discrete deterministic planning domains. In this framework, an agent learns the domain by observing the action effects through continuous features that describe the state of the environment after the execution of each action. Besides, the agent learns its perception function, i.e., a probabilistic mapping between state variables and sensor data represented as a vector of continuous random variables called perception variables. We define an algorithm that updates the planning domain and the perception function by (i) introducing new states, either by extending the possible values of state variables, or by weakening their constraints; (ii) adapts the perception function to fit the observed data (iii) adapts the transition function on the basis of the executed actions and the effects observed via the perception function. The framework is able to deal with exogenous events that happen in the environment.
Experience-based planning domains have been proposed to improve problem solving by learning from experience. They rely on acquiring and using task knowledge, i.e., activity schemata, for generating solutions to problem instances in a class of tasks. Using Three-Valued Logic Analysis (TVLA), we extend previous work to generate a set of conditions that determine the scope of applicability of an activity schema. The inferred scope is a bounded representation of a set of problems of potentially unbounded size, in the form of a 3-valued logical structure, which is used to automatically find an applicable activity schema for solving task problems. We validate this work in two classical planning domains.
To avoid the paper being thrown in the bin we provide this with a large, negative reward, say -1, and because the teacher is pleased with it being placed in the bin this nets a large positive reward, 1. To avoid the outcome where it continually gets passed around the room, we set the reward for all other actions to be a small, negative value, say -0.04. If we set this as a positive or null number then the model may let the paper go round and round as it would be better to gain small positives than risk getting close to the negative outcome.
Recent progress in reinforcement learning (RL) using self-game-play has shown remarkable performance on several board games (e.g., Chess and Go) as well as video games (e.g., Atari games and Dota2). It is plausible to consider that RL, starting from zero knowledge, might be able to gradually approximate a winning strategy after a certain amount of training. In this paper, we explore neural Monte-Carlo-Tree-Search (neural MCTS), an RL algorithm which has been applied successfully by DeepMind to play Go and Chess at a super-human level. We try to leverage the computational power of neural MCTS to solve a class of combinatorial optimization problems. Following the idea of Hintikka's Game-Theoretical Semantics, we propose the Zermelo Gamification (ZG) to transform specific combinatorial optimization problems into Zermelo games whose winning strategies correspond to the solutions of the original optimization problem. The ZG also provides a specially designed neural MCTS. We use a combinatorial planning problem for which the ground-truth policy is efficiently computable to demonstrate that ZG is promising.
This paper develops a path planner that minimizes risk (e.g. motion execution) while maximizing accumulated reward (e.g., quality of sensor viewpoint) motivated by visual assistance or tracking scenarios in unstructured or confined environments. In these scenarios, the robot should maintain the best viewpoint as it moves to the goal. However, in unstructured or confined environments, some paths may increase the risk of collision; therefore there is a tradeoff between risk and reward. Conventional state-dependent risk or probabilistic uncertainty modeling do not consider path-level risk or is difficult to acquire. This risk-reward planner explicitly represents risk as a function of motion plans, i.e., paths. Without manual assignment of the negative impact to the planner caused by risk, this planner takes in a pre-established viewpoint quality map and plans target location and path leading to it simultaneously, in order to maximize overall reward along the entire path while minimizing risk. Exact and approximate algorithms are presented, whose solution is further demonstrated on a physical tethered aerial vehicle. Other than the visual assistance problem, the proposed framework also provides a new planning paradigm to address minimum-risk planning under dynamical risk and absence of substructure optimality and to balance the trade-off between reward and risk.
Recent work in decentralized, schedule-driven traffic control has demonstrated the ability to significantly improve traffic flow efficiency in complex urban road networks. However, in situations where vehicle volumes increase to the point that the physical capacity of a road network reaches or exceeds saturation, it has been observed that the effectiveness of a schedule-driven approach begins to degrade, leading to progressively higher network congestion. In essence, the traffic control problem becomes less of a scheduling problem and more of a queue management problem in this circumstance. In this paper we propose a composite approach to real-time traffic control that uses sensed information on queue lengths to influence scheduling decisions and gracefully shift the signal control strategy to queue management in high volume/high congestion settings. Specifically, queue-length information is used to establish weights for the sensed vehicle clusters that must be scheduled through a given intersection at any point, and hence bias the wait time minimization calculation. To compute these weights, we develop a model in which successive movement phases are viewed as different states of an Ising model, and parameters quantify strength of interactions. To ensure scalability, queue information is only exchanged between direct neighbors and the asynchronous nature of local intersection scheduling is preserved. We demonstrate the potential of the approach through microscopic traffic simulation of a real-world road network, showing a 60% reduction in average wait times over the baseline schedule-driven approach in heavy traffic scenarios. We also report initial field test results, which show the ability to reduce queues during heavy traffic periods.
Experience-based planning domains (EBPDs) have been recently proposed to improve problem solving by learning from experience. EBPDs provide important concepts for long-term learning and planning in robotics. They rely on acquiring and using task knowledge, i.e., activity schemata, for generating concrete solutions to problem instances in a class of tasks. Using Three-Valued Logic Analysis (TVLA), we extend previous work to generate a set of conditions as the scope of applicability for an activity schema. The inferred scope is a bounded representation of a set of problems of potentially unbounded size, in the form of a 3-valued logical structure, which allows an EBPD system to automatically find an applicable activity schema for solving task problems. We demonstrate the utility of our approach in a set of classes of problems in a simulated domain and a class of real world tasks in a fully physically simulated PR2 robot in Gazebo.
This paper presents a novel approach for learning STRIPS action models from examples that compiles this inductive learning task into a classical planning task. Interestingly, the compilation approach is flexible to different amounts of available input knowledge; the learning examples can range from a set of plans (with their corresponding initial and final states) to just a pair of initial and final states (no intermediate action or state is given). Moreover, the compilation accepts partially specified action models and it can be used to validate whether the observation of a plan execution follows a given STRIPS action model, even if this model is not fully specified.