Planning & Scheduling
Hierarchical Planning in the IPC
Hรถller, D., Behnke, G., Bercher, P., Biundo, S., Fiorino, H., Pellier, D., Alford, R.
Over the last year, the amount of research in hierarchical planning has increased, leading to significant improvements in the performance of planners. However, the research is diverging and planners are somewhat hard to compare against each other. This is mostly caused by the fact that there is no standard set of benchmark domains, nor even a common description language for hierarchical planning problems. As a consequence, the available planners support a widely varying set of features and (almost) none of them can solve (or even parse) any problem developed for another planner. With this paper, we propose to create a new track for the IPC in which hierarchical planners will compete. This competition will result in a standardised description language, broader support for core features of that language among planners, a set of benchmark problems, a means to fairly and objectively compare HTN planners, and for new challenges for planners. Introduction When the International Planning Competition (IPC) started out in 1998 it aimed to include both classical and hierarchical planners as competitors (McDermott 2000).
Multi-Step Greedy and Approximate Real Time Dynamic Programming
Efroni, Yonathan, Ghavamzadeh, Mohammad, Mannor, Shie
Real Time Dynamic Programming (RTDP) is a well-known Dynamic Programming (DP) based algorithm that combines planning and learning to find an optimal policy for an MDP. It is a planning algorithm because it uses the MDP's model (reward and transition functions) to calculate a 1-step greedy policy w.r.t.~an optimistic value function, by which it acts. It is a learning algorithm because it updates its value function only at the states it visits while interacting with the environment. As a result, unlike DP, RTDP does not require uniform access to the state space in each iteration, which makes it particularly appealing when the state space is large and simultaneously updating all the states is not computationally feasible. In this paper, we study a generalized multi-step greedy version of RTDP, which we call $h$-RTDP, in its exact form, as well as in three approximate settings: approximate model, approximate value updates, and approximate state abstraction. We analyze the sample, computation, and space complexities of $h$-RTDP and establish that increasing $h$ improves sample and space complexity, with the cost of additional offline computational operations. For the approximate cases, we prove that the asymptotic performance of $h$-RTDP is the same as that of a corresponding approximate DP -- the best one can hope for without further assumptions on the approximation errors. $h$-RTDP is the first algorithm with a provably improved sample complexity when increasing the lookahead horizon.
Double-oracle sampling method for Stackelberg Equilibrium approximation in general-sum extensive-form games
Karwowski, Jan, Maลdziuk, Jacek
The paper presents a new method for approximating Strong Stackelberg Equilibrium in general-sum sequential games with imperfect information and perfect recall. The proposed approach is generic as it does not rely on any specific properties of a particular game model. The method is based on iterative interleaving of the two following phases: (1) guided Monte Carlo Tree Search sampling of the Follower's strategy space and (2) building the Leader's behavior strategy tree for which the sampled Follower's strategy is an optimal response. The above solution scheme is evaluated with respect to expected Leader's utility and time requirements on three sets of interception games with variable characteristics, played on graphs. A comparison with three state-of-the-art MILP/LP-based methods shows that in vast majority of test cases proposed simulation-based approach leads to optimal Leader's strategies, while excelling the competitive methods in terms of better time scalability and lower memory requirements.
Learning Action Models from Disordered and Noisy Plan Traces
Zhuo, Hankz Hankui, Peng, Jing, Kambhampati, Subbarao
There is increasing awareness in the planning community that the burden of specifying complete domain models is too high, which impedes the applicability of planning technology in many real-world domains. Although there have many learning systems that help automatically learning domain models, most existing work assumes that the input traces are completely correct. A more realistic situation is that the plan traces are disordered and noisy, such as plan traces described by natural language. In this paper we propose and evaluate an approach for doing this. Our approach takes as input a set of plan traces with disordered actions and noise and outputs action models that can best explain the plan traces. We use a MAX-SAT framework for learning, where the constraints are derived from the given plan traces. Unlike traditional action models learners, the states in plan traces can be partially observable and noisy as well as the actions in plan traces can be disordered and parallel. We demonstrate the effectiveness of our approach through a systematic empirical evaluation with both IPC domains and the real-world dataset extracted from natural language documents.
Artificial Intelligence By Example
With Artificial Intelligence By Example, develop your own method for future AI solutions. Acquire advanced AI, machine learning, and deep learning design skills. Description Topics included: Become an Adaptive Thinker โข Think like a Machine โข Apply Machine Thinking to a Human Problem โข Become an Unconventional Innovator โข Manage the Power of Machine Learning and Deep Learning โข Don't Get Lost in Techniques โ Focus on Optimizing โข Your Solutions โข When and How to Use Artificial Intelligence โข Revolutions Designed for Some Corporations and Disruptive โข Innovations for Small to Large Companies โข Getting Your Neurons to Work โข Applying Biomimicking to Artificial Intelligence โข Conceptual Representation Learning โข Automated Planning and Scheduling โข AI and the Internet of Things (IoT) โข Optimizing Blockchains with AI โข Cognitive NLP Chatbots โข Improve the Emotional Intelligence Deficiencies of Chatbots โข Quantum Computers That Think
Efficient Multivariate Bandit Algorithm with Path Planning
Nie, Keyu, Zhang, Zezhong, Yuan, Ted Tao, Song, Rong, Burke, Pauline Berry
In this paper, we solve the arms exponential exploding issue in multivariate Multi-Armed Bandit (Multivariate-MAB) problem when the arm dimension hierarchy is considered. We propose a framework called path planning (TS-PP) which utilizes decision graph/trees to model arm reward success rate with m-way dimension interaction, and adopts Thompson sampling (TS) for heuristic search of arm selection. Naturally, it is quite straightforward to combat the curse of dimensionality using a serial processes that operates sequentially by focusing on one dimension per each process. For our best acknowledge, we are the first to solve Multivariate-MAB problem using graph path planning strategy and deploying alike Monte-Carlo tree search ideas. Our proposed method utilizing tree models has advantages comparing with traditional models such as general linear regression. Simulation studies validate our claim by achieving faster convergence speed, better efficient optimal arm allocation and lower cumulative regret.
Automatic Critical Mechanic Discovery in Video Games
Green, Michael Cerny, Khalifa, Ahmed, Barros, Gabriella A. B., Machado, Tiago, Togelius, Julian
We present a system that automatically discovers critical mechanics in a variety of video games within the General Video Game Artificial Intelligence (GVG-AI) framework using a combination of game description parsing and playtrace information. Critical mechanics are defined as the mechanics most necessary to trigger in order to perform well in the game. In a user study, human-identified mechanics are compared against system-identified mechanics to verify alignment between humans and the system. The results of the study demonstrate that our method is able to match humans with high consistency. Our system is further validated by comparing MCTS agents augmented with critical mechanic information against baseline MCTS agents on 4 games in GVG-AI. The augmented agents show a significant performance improvement over their baseline counterparts for all 4 tested games, demonstrating that knowledge of system-identified mechanics are responsible for improved performance.
About AI Planner Package Manager UI website
Use the AI Planner package to create agents that generate and execute plans. For example, use AI Planner to create an NPC, generate storylines, or validate game/simulation mechanics. The AI Planner package also includes authoring tools and a plan visualizer. To install this package, follow the instructions in the Package Manager documentation. During execution, it is also useful to view an agent's plan through the plan visualizer.
Allen's Interval Algebra Makes the Difference
Janhunen, Tomi, Sioutis, Michael
Allen's Interval Algebra constitutes a framework for reasoning about temporal information in a qualitative manner. In particular, it uses intervals, i.e., pairs of endpoints, on the timeline to represent entities corresponding to actions, events, or tasks, and binary relations such as precedes and overlaps to encode the possible configurations between those entities. Allen's calculus has found its way in many academic and industrial applications that involve, most commonly, planning and scheduling, temporal databases, and healthcare. In this paper, we present a novel encoding of Interval Algebra using answer-set programming (ASP) extended by difference constraints, i.e., the fragment abbreviated as ASP(DL), and demonstrate its performance via a preliminary experimental evaluation. Although our ASP encoding is presented in the case of Allen's calculus for the sake of clarity, we suggest that analogous encodings can be devised for other point-based calculi, too.
Modelling Bushfire Evacuation Behaviours
Bushfires pose a significant threat to Australia's regional areas. To minimise risk and increase resilience, communities need robust evacuation strategies that account for people's likely behaviour both before and during a bushfire. Agent-based modelling (ABM) offers a practical way to simulate a range of bushfire evacuation scenarios. However, the ABM should reflect the diversity of possible human responses in a given community. The Belief-Desire-Intention (BDI) cognitive model captures behaviour in a compact representation that is understandable by domain experts. Within a BDI-ABM simulation, individual BDI agents can be assigned profiles that determine their likely behaviour. Over a population of agents their collective behaviour will characterise the community response. These profiles are drawn from existing human behaviour research and consultation with emergency services personnel and capture the expected behaviours of identified groups in the population, both prior to and during an evacuation. A realistic representation of each community can then be formed, and evacuation scenarios within the simulation can be used to explore the possible impact of population structure on outcomes. It is hoped that this will give an improved understanding of the risks associated with evacuation, and lead to tailored evacuation plans for each community to help them prepare for and respond to bushfire.