Goto

Collaborating Authors

 Planning & Scheduling


Multiple Policy Value Monte Carlo Tree Search

arXiv.org Artificial Intelligence

Many of the strongest game playing programs use a combination of Monte Carlo tree search (MCTS) and deep neural networks (DNN), where the DNNs are used as policy or value evaluators. Given a limited budget, such as online playing or during the self-play phase of AlphaZero (AZ) training, a balance needs to be reached between accurate state estimation and more MCTS simulations, both of which are critical for a strong game playing agent. Typically, larger DNNs are better at generalization and accurate evaluation, while smaller DNNs are less costly, and therefore can lead to more MCTS simulations and bigger search trees with the same budget. This paper introduces a new method called the multiple policy value MCTS (MPV-MCTS), which combines multiple policy value neural networks (PV-NNs) of various sizes to retain advantages of each network, where two PV-NNs f_S and f_L are used in this paper. We show through experiments on the game NoGo that a combined f_S and f_L MPV-MCTS outperforms single PV-NN with policy value MCTS, called PV-MCTS. Additionally, MPV-MCTS also outperforms PV-MCTS for AZ training.


Guarantees for Sound Abstractions for Generalized Planning (Extended Paper)

arXiv.org Artificial Intelligence

Generalized planning is about finding plans that solve collections of planning instances, often infinite collections, rather than single instances. Recently it has been shown how to reduce the planning problem for generalized planning to the planning problem for a qualitative numerical problem; the latter being a reformulation that simultaneously captures all the instances in the collection. An important thread of research thus consists in finding such reformulations, or abstractions, automatically. A recent proposal learns the abstractions inductively from a finite and small sample of transitions from instances in the collection. However, as in all inductive processes, the learned abstraction is not guaranteed to be correct for the whole collection. In this work we address this limitation by performing an analysis of the abstraction with respect to the collection, and show how to obtain formal guarantees for generalization. These guarantees, in the form of first-order formulas, may be used to 1) define subcollections of instances on which the abstraction is guaranteed to be sound, 2) obtain necessary conditions for generalization under certain assumptions, and 3) do automated synthesis of complex invariants for planning problems. Our framework is general, it can be extended or combined with other approaches, and it has applications that go beyond generalized planning.


Learning Portable Representations for High-Level Planning

arXiv.org Artificial Intelligence

We present a framework for autonomously learning a portable representation that describes a collection of low-level continuous environments. We show that these abstract representations can be learned in a task-independent egocentric space specific to the agent that, when grounded with problem-specific information, are provably sufficient for planning. We demonstrate transfer in two different domains, where an agent learns a portable, task-independent symbolic vocabulary, as well as rules expressed in that vocabulary, and then learns to instantiate those rules on a per-task basis. This reduces the number of samples required to learn a representation of a new task.


Dynamic Epistemic Logic with ASP Updates: Application to Conditional Planning

arXiv.org Artificial Intelligence

Dynamic Epistemic Logic (DEL) is a family of multimodal logics that has proved to be very successful for epistemic reasoning in planning tasks. In this logic, the agent's knowledge is captured by modal epistemic operators whereas the system evolution is described in terms of (some subset of) dynamic logic modalities in which actions are usually represented as semantic objects called event models. In this paper, we study a variant of DEL, that wecall DEL[ASP], where actions are syntactically described by using an Answer Set Programming (ASP) representation instead of event models. This representation directly inherits high level expressive features like indirect effects, qualifications, state constraints, defaults, or recursive fluents that are common in ASP descriptions of action domains. Besides, we illustrate how this approach can be applied for obtaining conditional plans in single-agent, partially observable domains where knowledge acquisition may be represented as indirect effects of actions.


Balancing Goal Obfuscation and Goal Legibility in Settings with Cooperative and Adversarial Observers

arXiv.org Artificial Intelligence

In order to be useful in the real world, AI agents need to plan and act in the presence of others, who may include adversarial and cooperative entities. In this paper, we consider the problem where an autonomous agent needs to act in a manner that clarifies its objectives to cooperative entities while preventing adversarial entities from inferring those objectives. We show that this problem is solvable when cooperative entities and adversarial entities use different types of sensors and/or prior knowledge. We develop two new solution approaches for computing such plans. One approach provides an optimal solution to the problem by using an IP solver to provide maximum obfuscation for adversarial entities while providing maximum legibility for cooperative entities in the environment, whereas the other approach provides a satisficing solution using heuristic-guided forward search to achieve preset levels of obfuscation and legibility for adversarial and cooperative entities respectively. We show the feasibility and utility of our algorithms through extensive empirical evaluation on problems derived from planning benchmarks.


Q&A: Travel startup paves way for industry consolidation (Includes interview)

#artificialintelligence

In May 2019 Google announced the consolidation of all its travel features. Google Maps, Trips, Hotels and Flights will combine to make one Google Travel, easing the process for vacation planning. Travel startup VacationRenter, which launched last year, pioneered this model for vacation rentals, based on an artificial intelligence driven platform. According to VacationRenter's newly appointed COO, ex-Googler Marco del Rosario, both Google Travel and VacationRenter are early adopters of a pivotal strategy for today's travel technology: consolidation. Digital Journal: How has the world of travel changed in recent years?



Minimizing the Negative Side Effects of Planning with Reduced Models

arXiv.org Artificial Intelligence

Reduced models of large Markov decision processes accelerate planning by considering a subset of outcomes for each state-action pair. This reduction in reachable states leads to replanning when the agent encounters states without a precomputed action during plan execution. However, not all states are suitable for replanning. In the worst case, the agent may not be able to reach the goal from the newly encountered state. Agents should be better prepared to handle such risky situations and avoid replanning in risky states. Hence, we consider replanning in states that are unsafe for deliberation as a negative side effect of planning with reduced models. While the negative side effects can be minimized by always using the full model, this defeats the purpose of using reduced models. The challenge is to plan with reduced models, but somehow account for the possibility of encountering risky situations. An agent should thus only replan in states that the user has approved as safe for replanning. To that end, we propose planning using a portfolio of reduced models, a planning paradigm that minimizes the negative side effects of planning using reduced models by alternating between different outcome selection approaches. We empirically demonstrate the effectiveness of our approach on three domains: an electric vehicle charging domain using real-world data from a university campus and two benchmark planning problems.


A Correctness Result for Synthesizing Plans With Loops in Stochastic Domains

arXiv.org Artificial Intelligence

In AI, FSCs are much sought after for automated planning paradigms such as generalized planning, as in Figure 1, where one attempts to synthesize a controller that works in multiple initial states. Such controllers are usually handwritten by domain experts, which is problematic when expert knowledge is either unavailable or unreliable. To that end, the automated synthesis of FSCs has received considerable attention in recent years, e.g., [16, 7, 26, 24, 11, 27]. Of course, FSCs synthesis is closely related to program synthesis [16], and FSCs are frequently seen as program-like plans [17], and recent synthesis literature involves an exciting exchange of technical insights between the two fields [26]; representative examples include the use of program synthesis to infer high-level action types [25], and the use of partial order planning for imperative program synthesis [13]. Naturally, from an algorithmic perspective, the two most immediate questions are: in which sense are controllers correct, and how do we synthesize controllers that are provably correct?


Multi-Robot Informative Path Planning in Unknown Environments Through Continuous Region Partitioning

AAAI Conferences

Information collection is an important application of multi-robot systems especially in environments that are difficult to operate for humans. The objective of the robots is to maximize information collection from the environment while remaining in their path-length budgets. In this paper, we propose a novel multi-robot information collection algorithm that uses a continuous region partitioning approach to efficiently divide an unknown environment among the robots based on the discovered obstacles in the area, for better load-balancing. Our algorithm gracefully handles situations when some of the robots cannot communicate with other robots due to limited communication ranges.