Planning & Scheduling
Planning Under Partial Observability by Classical Replanning: Theory and Experiments
Bonet, Blai (Universidad Simon Bolivar) | Geffner, Hector (ICREA and Universitat Pompeu Fabra)
Planning with partial observability can be formulated as a non-deterministic search problem in belief space. The problem is harder than classical planning as keeping track of beliefs is harder than keeping track of states, and searching for action policies is harder than searching for action sequences. In this work, we develop a framework for partial observability that avoids these limitations and leads to a planner that scales up to larger problems. For this, the class of problems is restricted to those in which 1) the non-unary clauses representing the uncertainty about the initial situation are nvariant, and 2) variables that are hidden in the initial situation do not appear in the body of conditional effects, which are all assumed to be deterministic. We show that such problems can be translated in linear time into equivalent fully observable non-deterministic planning problems, and that an slight extension of this translation renders the problem solvable by means of classical planners. The whole approach is sound and complete provided that in addition, the state-space is connected. Experiments are also reported.
The Role of Intention Recognition in the Evolution of Cooperative Behavior
Han, The Anh (Universidade Nova de Lisboa) | Pereira, Luis Moniz (Universidade Nova de Lisboa) | Santos, Francisco C. (Universidade Nova de Lisboa)
Given its ubiquity, scale and complexity, few problems have created the combined interest of so many unrelated areas as the evolution of cooperation. Using the tools of evolutionary game theory, here we address, for the first time, the role played by intention recognition in the final outcome of cooperation in large populations of self-regarding individuals. By equipping individuals with the capacity of assessing intentions of others in the course of repeated Prisoner's Dilemma interactions, we show how intention recognition opens a window of opportunity for cooperation to thrive, as it precludes the invasion of pure cooperators by random drift while remaining robust against defective strategies. Intention recognizers are able to assign an intention to the action of their opponents based on an acquired corpus of possible intentions. We show how intention recognizers can prevail against most famous strategies of repeated dilemmas of cooperation, even in the presence of errors. Our approach invites the adoption of other classification and pattern recognition mechanisms common among Humans, to unveil the evolution of complex cognitive processes in the context of social dilemmas.
Generalized Planning: Synthesizing Plans that Work for Multiple Environments
Hu, Yuxiao (University of Toronto) | Giacomo, Giuseppe De (Sapienza &ndash)
We give a formal definition of generalized planning that is independent of any representation formalism. We assume that our generalized plans must work on a set of deterministic environments, which are essentially unrelated to each other. We prove that generalized planning for a finite set of environments is always decidable and EXPSPACE-complete. Our proof is constructive and gives us a sound, complete and complexity-wise optimal technique. We also consider infinite sets of environments, and show that generalized planning for the infinite "one-dimensional problems," known in the literature to be recursively enumerable when restricted to finite-state plans, is EXPSPACE-decidable without sequence functions, and solvable by generalized planning for finite sets.
Nested Rollout Policy Adaptation for Monte Carlo Tree Search
Rosin, Christopher D. (Parity Computing, Inc.)
Monte Carlo tree search (MCTS) methods have had recent success in games, planning, and optimization. MCTS uses results from rollouts to guide search; a rollout is a path that descends the tree with a randomized decision at each ply until reaching a leaf. MCTS results can be strongly influenced by the choice of appropriate policy to bias the rollouts. Most previous work on MCTS uses static uniform random or domain-specific policies. We describe a new MCTS method that dynamically adapts the rollout policy during search, in deterministic optimization problems. Our starting point is Cazenave's original Nested Monte Carlo Search (NMCS), but rather than navigating the tree directly we instead use gradient ascent on the rollout policy at each level of the nested search. We benchmark this new Nested Rollout Policy Adaptation (NRPA) algorithm and examine its behavior. Our test problems are instances of Crossword Puzzle Construction and Morpion Solitaire. Over moderate time scales NRPA can substantially improve search efficiency compared to NMCS, and over longer time scales NRPA improves upon all previous published solutions for the test problems. Results include a new Morpion Solitaire solution that improves upon the previous human-generated record that had stood for over 30 years.
Multi-Agent Plan Recognition with Partial Team Traces and Plan Libraries
Zhuo, Hankz Hankui (Sun Yat-sen University) | Li, Lei (Sun Yat-sen University)
Multi-Agent Plan Recognition (MAPR) seeks to proposed to formalize MAPR with a new model, revealing identify the dynamic team structures and team behaviors the distinction between the hardness of single and multi-agent from the observed activity sequences (team plan recognition, and solve MAPR problems in the model using traces) of a set of intelligent agents, based on a a first-cut approach, provided that a fully observed team library of known team activity sequences (team trace and a library of full team plans were given as input plans). Previous MAPR systems require that team [Banerjee et al., 2010]; etc. traces and team plans are fully observed. In this Despite the success of previous approaches, they either assume paper we relax this constraint, i.e., team traces and that agents in the same team can only execute a common team plans are allowed to be partial. This is an important activity, i.e., coordinated activities of agents are not allowed task in applying MAPR to real-world domains, in a team, or require that the team trace and team plans are since in many applications it is often difficult complete, i.e., missing values (activities that are missing) are to collect full team traces or team plans due not allowed. In many real-world applications, however, it is to environment limitations, e.g., military operation.
Online Planning for Ad Hoc Autonomous Agent Teams
Wu, Feng (University of Science and Technology of China) | Zilberstein, Shlomo (University of Massachusetts Amherst) | Chen, Xiaoping (University of Science and Technology of China)
We propose a novel online planning algorithm for ad hoc team settings — challenging situations in which an agent must collaborate with unknown teammates without prior coordination. Our approach is based on constructing and solving a series of stage games, and then using biased adaptive play to choose actions. The utility function in each stage game is estimated via Monte-Carlo tree search using the UCT algorithm. We establish analytically the convergence of the algorithm and show that it performs well in a variety of ad hoc team domains.
Timing Tweets to Increase Effectiveness of Information Campaigns
Dabeer, Onkar (Tata Institute of Fundamental Research) | Mehendale, Prachi (Tata Institute of Fundamental Research) | Karnik, Aditya (India Science Lab) | Saroop, Atul (India Science Lab)
Microblogging websites such as Twitter are increasingly being used by businesses/campaigners for timely dissemination of information to their followers. The diffusion of a tweet depends on several factors: the activity of the follower nodes, the responsiveness of follower nodes to tweets from the source node, the out-degree of the follower nodes, the content of recent related tweets seen by the follower node, etc. Using such factors, in this paper, we propose a framework to measure the effectiveness of an information campaign over Twitter. We consider a positive as well as a negative metric to measure the impact of a tweet: while retweets are used to measure the positive impact, the lack of a timely response from an active follower node is taken as a potential negative impact. We investigate the scheduling of tweets to increase the net positive impact while keeping the net negative impact below a desired level. We propose and study several scheduling algorithms by casting the problem in a Markov Decision Process (MDP) framework. In order to compare our algorithms, we estimate the model parameters from tweet data collected using the Twitter API from an arbitrarily selected node and its 6837 followers over several months. For this dataset, we find that if successive tweets in the campaign are novel, then substantial gains over user activity based scheduling can be obtained by scheduling tweets in time slots where the ratio of the expected positive and negative metrics is high. We call this the MaxRatio policy and we show that it is optimal under certain conditions. In cases where we are not certain about the response of users to successive related tweets, we identify another algorithm (which we call MaxReach) as a robust alternative.
Knowledge Transfer between Automated Planners
Fernandez, Susana (Universidad Carlos III de Madrid) | Aler, Ricardo (Universidad Carlos III de Madrid) | Borrajo, Daniel (Universidad Carlos III de Madrid)
In this article, we discuss the problem of transferring search heuristics from one planner to another. More specifically, we demonstrate how to transfer the domain-dependent heuristics acquired by one planner into a second planner. Our motivation is to improve the efficiency and the efficacy of the second planner by allowing it to use the transferred heuristics to capture domain regularities that it would not otherwise recognize. Our experimental results show that the transferred knowledge does improve the second planner's performance on novel tasks over a set of seven benchmark planning domains.
Knowledge Transfer between Automated Planners
Fernandez, Susana (Universidad Carlos III de Madrid) | Aler, Ricardo (Universidad Carlos III de Madrid) | Borrajo, Daniel (Universidad Carlos III de Madrid)
In this article, we discuss the problem of transferring search heuristics from one planner to another. More specifically, we demonstrate how to transfer the domain-dependent heuristics acquired by one planner into a second planner. Our motivation is to improve the efficiency and the efficacy of the second planner by allowing it to use the transferred heuristics to capture domain regularities that it would not otherwise recognize. Our experimental results show that the transferred knowledge does improve the second planner's performance on novel tasks over a set of seven benchmark planning domains.
Scalable Distributed Monte-Carlo Tree Search
Yoshizoe, Kazuki (The University of Tokyo) | Kishimoto, Akihiro (Tokyo Institute of Technology and Japan Science and Technology Agency) | Kaneko, Tomoyuki (The University of Tokyo) | Yoshimoto, Haruhiro (The University of Tokyo) | Ishikawa, Yutaka (The University of Tokyo)
Monte-Carlo Tree Search (MCTS) is remarkably successful in two-player games, but parallelizing MCTS has been notoriously difficult to scale well, especially in distributed environments. For a distributed parallel search, transposition-table driven scheduling (TDS) is known to be efficient in several domains. We present a massively parallel MCTS algorithm, that applies the TDS parallelism to the Upper Confidence bound Applied to Trees (UCT) algorithm, which is the most representative MCTS algorithm. To drastically decrease communication overhead, we introduce a reformulation of UCT called Depth-First UCT. The parallel performance of the algorithm is evaluated on clusters using up to 1,200 cores in artificial game-trees. We show that this approach scales well, achieving 740-fold speedups in the best case.