Planning & Scheduling
IPC: A Benchmark Data Set for Learning with Graph-Structured Data
Ferber, Patrick, Ma, Tengfei, Huo, Siyu, Chen, Jie, Katz, Michael
Benchmark data sets are an indispensable ingredient of the evaluation of graph-based machine learning methods. We release a new data set, compiled from International Planning Competitions (IPC), for benchmarking graph classification, regression, and related tasks. Apart from the graph construction (based on AI planning problems) that is interesting in its own right, the data set possesses distinctly different characteristics from popularly used benchmarks. The data set, named IPC, consists of two self-contained versions, grounded and lifted, both including graphs of large and skewedly distributed sizes, posing substantial challenges for the computation of graph models such as graph kernels and graph neural networks. The graphs in this data set are directed and the lifted version is acyclic, offering the opportunity of benchmarking specialized models for directed (acyclic) structures. Moreover, the graph generator and the labeling are computer programmed; thus, the data set may be extended easily if a larger scale is desired. The data set is accessible from \url{https://github.com/IBM/IPC-graph-data}.
Combining Parametric and Nonparametric Models for Off-Policy Evaluation
Gottesman, Omer, Liu, Yao, Sussex, Scott, Brunskill, Emma, Doshi-Velez, Finale
We consider a model-based approach to perform batch off-policy evaluation in reinforcement learning. Our method takes a mixture-of-experts approach to combine parametric and non-parametric models of the environment such that the final value estimate has the least expected error. We do so by first estimating the local accuracy of each model and then using a planner to select which model to use at every time step as to minimize the return error estimate along entire trajectories. Across a variety of domains, our mixture-based approach outperforms the individual models alone as well as state-of-the-art importance sampling-based estimators.
Timeline-based Planning and Execution with Uncertainty: Theory, Modeling Methodologies and Practice
Automated Planning is one of the main research field of Artificial Intelligence since its beginnings. Research in Automated Planning aims at developing general reasoners (i.e., planners) capable of automatically solve complex problems. Broadly speaking, planners rely on a general model characterizing the possible states of the world and the actions that can be performed in order to change the status of the world. Given a model and an initial known state, the objective of a planner is to synthesize a set of actions needed to achieve a particular goal state. The classical approach to planning roughly corresponds to the description given above. The timeline-based approach is a particular planning paradigm capable of integrating causal and temporal reasoning within a unified solving process. This approach has been successfully applied in many real-world scenarios although a common interpretation of the related planning concepts is missing. Indeed, there are significant differences among the existing frameworks that apply this technique. Each framework relies on its own interpretation of timeline-based planning and therefore it is not easy to compare these systems. Thus, the objective of this work is to investigate the timeline-based approach to planning by addressing several aspects ranging from the semantics of the related planning concepts to the modeling and solving techniques. Specifically, the main contributions of this PhD work consist of: (i) the proposal of a formal characterization of the timeline-based approach capable of dealing with temporal uncertainty; (ii) the proposal of a hierarchical modeling and solving approach; (iii) the development of a general purpose framework for planning and execution with timelines; (iv) the validation{\dag}of this approach in real-world manufacturing scenarios.
Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates
Soemers, Dennis J. N. J., Piette, รric, Stephenson, Matthew, Browne, Cameron
In recent years, state-of-the-art game-playing agents often involve policies that are trained in self-playing processes where Monte Carlo tree search (MCTS) algorithms and trained policies iteratively improve each other. The strongest results have been obtained when policies are trained to mimic the search behaviour of MCTS by minimising a cross-entropy loss. Because MCTS, by design, includes an element of exploration, policies trained in this manner are also likely to exhibit a similar extent of exploration. In this paper, we are interested in learning policies for a project with future goals including the extraction of interpretable strategies, rather than state-of-the-art game-playing performance. For these goals, we argue that such an extent of exploration is undesirable, and we propose a novel objective function for training policies that are not exploratory. We derive a policy gradient expression for maximising this objective function, which can be estimated using MCTS value estimates, rather than MCTS visit counts. We empirically evaluate various properties of resulting policies, in a variety of board games.
Hulking 165-pound humanoid robot delicately 'walks a tightrope' of tiny blocks
Fascinating footage shows a robot using autonomous planning to precisely move along a treacherous path of narrow cinder blocks. Researchers trained the 165-pound'humanoid robot' to walk across narrow terrain by using human-like control, perception and planning algorithms. The video shows the robot, called Atlas, carefully moving across a balance beam using body control created using LIDAR. This system uses a pulsed laser to measure the distance between objects and this is procssed by the machine so it can step correctly on the narrow terrain. The researchers, from the Institute for Human & Machine Cognition (IHMC) in Florida, hope that the tech could be used for bomb squads or rescue missions.
Robust Goal Recognition with Operator-Counting Heuristics
Meneguzzi, Felipe, Pereira, Andrรฉ Grahl, Pereira, Ramon Fraga
Goal recognition is the problem of inferring the correct Operator-counting heuristics provide a unifying framework goal towards which an agent executes a plan, for a variety of sources of information from planning heuristics given a set of goal hypotheses, a domain model, [Hoffmann that provide both an estimate ofet al., 2004] and a (possibly noisy) sample of the plan being the total cost of a goal from any given state and and indication executed. This is a key problem in both cooperative of the actual operators likely to be in such plans. This and competitive agent interactions and recent information proves to be effective at differentiating between approaches have produced fast and accurate goal goal hypotheses in goal recognition, as we empirically show recognition algorithms.
Memory Bounded Open-Loop Planning in Large POMDPs using Thompson Sampling
Phan, Thomy, Belzner, Lenz, Kiermeier, Marie, Friedrich, Markus, Schmid, Kyrill, Linnhoff-Popien, Claudia
State-of-the-art approaches to partially observable planning like POMCP are based on stochastic tree search. While these approaches are computationally efficient, they may still construct search trees of considerable size, which could limit the performance due to restricted memory resources. In this paper, we propose Partially Observable Stacked Thompson Sampling (POSTS), a memory bounded approach to open-loop planning in large POMDPs, which optimizes a fixed size stack of Thompson Sampling bandits. We empirically evaluate POSTS in four large benchmark problems and compare its performance with different tree-based approaches. We show that POSTS achieves competitive performance compared to tree-based open-loop planning and offers a performance-memory tradeoff, making it suitable for partially observable planning with highly restricted computational and memory resources.
How better handling of data can drive innovation in planning
Yet, despite the planning process itself generating vast amounts of data (directly and indirectly), technological innovation in the profession continues to lag behind. This lag might be because of the long legacy of planning's approach to filing, holding, and sharing its knowledge. Planning's greatest obstacle to digital innovation with data is just how much information is hard to access within a painfully analogue system. Future Cities Catapult's Digital Planning programme shows how new technologies that have revolutionised finance, medicine and education industries might also benefit planning. Take decision notices: Richly detailed, the information these documents hold โ critical pieces of data that are relevant for the lifespan of developments โ is too frequently lost to non-machine readable PDF filing.
From Abstractions to "Natural Languages" for Planning Agents
Despite our unique ability to use natural languages, we know little about their origins like how they are created and evolved. The answer lies deeply in the evolution of our cognitive and social abilities over a very long period of time which is beyond our scrutiny. Existing studies on the origin of languages are often focused on the emergence of specific language features (such as recursion) without supporting a comprehensive view. Investigation of restricted language representations, such as temporal logic, unfortunately does not reveal much about the impetus underlying language formation and evolution, since much of their construction is based on natural languages themselves. In this paper, we investigate the origin of "natural languages" in a restricted setting involving only planning agents. Similar to a common view that considers languages as a tool for grounding symbols to semantic meanings, we take the view that a language for planning agents is a tool for grounding symbols to physical configurations. From this perspective, a language is used by the agents to coordinate their behaviors during planning. With a few assumptions, we show that language is closely connected to a type of domain abstractions, based on which a language can be constructed. We study how such abstractions can be identified and discuss how to use them during planning. We apply our method to several domains, discuss the results, and relaxation of the assumptions made.
Non-myopic Planetary Exploration Combining In Situ and Remote Measurements
Kodgule, Suhit, Candela, Alberto, Wettergreen, David
Remote sensing can provide crucial information for planetary rovers. However, they must validate these orbital observations with in situ measurements. Typically, this involves validating hyperspectral data using a spectrometer on-board the field robot. In order to achieve this, the robot must visit sampling locations that jointly improve a model of the environment while satisfying sampling constraints. However, current planners follow sub-optimal greedy strategies that are not scalable to larger regions. We demonstrate how the problem can be effectively defined in an MDP framework and propose a planning algorithm based on Monte Carlo Tree Search, which is devoid of the common drawbacks of existing planners and also provides superior performance. We evaluate our approach using hyperspectral imagery of a well-studied geologic site in Cuprite, Nevada.