Goto

Collaborating Authors

 abstract graph





Automating the Refinement of Reinforcement Learning Specifications

arXiv.org Artificial Intelligence

Logical specifications have been shown to help reinforcement learning algorithms in achieving complex tasks. However, when a task is under-specified, agents might fail to learn useful policies. In this work, we explore the possibility of improving coarse-grained logical specifications via an exploration-guided strategy. We propose \textsc{AutoSpec}, a framework that searches for a logical specification refinement whose satisfaction implies satisfaction of the original specification, but which provides additional guidance therefore making it easier for reinforcement learning algorithms to learn useful policies. \textsc{AutoSpec} is applicable to reinforcement learning tasks specified via the SpectRL specification logic. We exploit the compositional nature of specifications written in SpectRL, and design four refinement procedures that modify the abstract graph of the specification by either refining its existing edge specifications or by introducing new edge specifications. We prove that all four procedures maintain specification soundness, i.e. any trajectory satisfying the refined specification also satisfies the original. We then show how \textsc{AutoSpec} can be integrated with existing reinforcement learning algorithms for learning policies from logical specifications. Our experiments demonstrate that \textsc{AutoSpec} yields promising improvements in terms of the complexity of control tasks that can be solved, when refined logical specifications produced by \textsc{AutoSpec} are utilized.



Compositional Policy Learning in Stochastic Control Systems with Formal Guarantees

arXiv.org Artificial Intelligence

Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks. However, the lack of formal guarantees about the behavior of such policies remains an impediment to their deployment. We propose a novel method for learning a composition of neural network policies in stochastic environments, along with a formal certificate which guarantees that a specification over the policy's behavior is satisfied with the desired probability. Unlike prior work on verifiable RL, our approach leverages the compositional nature of logical specifications provided in SpectRL, to learn over graphs of probabilistic reach-avoid specifications. The formal guarantees are provided by learning neural network policies together with reach-avoid supermartingales (RASM) for the graph's sub-tasks and then composing them into a global policy. We also derive a tighter lower bound compared to previous work on the probability of reach-avoidance implied by a RASM, which is required to find a compositional policy with an acceptable probabilistic threshold for complex tasks with multiple edge policies. We implement a prototype of our approach and evaluate it on a Stochastic Nine Rooms environment.


Abstracting Abstraction in Search with Applications to Planning

AAAI Conferences

Abstraction has been used in search and planning from the very beginning of AI. Many different methods and formalisms for abstraction have been proposed in the literature but they have been designed from various points of view and with varying purposes. Hence, these methods have been notoriously difficult to analyse and compare in a structured way. In order to improve upon this situation, we present a coherent and flexible framework for modelling abstraction (and abstraction-like) methods based on transformations on labelled graphs. Transformations can have certain method properties that are inherent in the abstraction methods and describe their fundamental modelling characteristics, and they can have certain instance properties that describe algorithmic and computational characteristics of problem instances. The usefulness of the framework is demonstrated by applying it to problems in both search and planning. First, we show that we can capture many search abstraction concepts (such as avoidance of backtracking between levels) and that we can put them into a broader context. We further model five different abstraction concepts from the planning literature. Analysing what method properties they have highlights their fundamental differences and similarities. Finally, we prove that method properties sometimes imply instance properties. Taking also those instance properties into account reveals important information about computational aspects of the five methods.


A Comparison of High-Level Approaches for Speeding Up Pathfinding

AAAI Conferences

Most games being shipped today use some form of high-level abstraction such as a navmesh or waypoint graph for path planning. These structures can generally be represented in a form which is compact enough to meet the tight memory constraints in a game. But, when such a graph grows too large, finding paths can still be a complex task. This challenge was faced in Dragon Age: Origins and solved by adding an additional level of abstraction.In the last few years a variety of novel approaches have been developed for finding optimal paths through graphs with specific design applications for road networks. Currently these techniques cannot be feasibly applied to the lowest detail of movement possible in a game map, but can be applied to the high-level abstractions which are commonly found in games.In this paper we describe the pathfinding challenge faced before shipping the title Dragon Age: Origins and perform a postmortem analysis on the extended abstraction that was used in comparison to building more advanced heuristics or the use of contraction hierarchies. We show that contraction hierarchies and abstractions have similar overhead and performance and are both useful approaches for high-level planning in games.


Edge Partitioning in Parallel Structured Duplicate Detection

AAAI Conferences

We show how edge partitioning, a technique originally developed for external-memory search, can be used to reduce the number of slow synchronization operations needed in parallel graph search. We show that edge partitioning improves on a previous technique called parallel structured duplicate detection by allowing a higher degree of concurrency, even for search problems with little or no inherent locality. For domain-independent graph search, we also show that edge partitioning significantly improves search speed by improving the efficiency of precondition checking. We demonstrate the effectiveness of this approach to parallel graph search for domain-independent STRIPS planning.


Additive Heuristic for Four-Connected Gridworlds

AAAI Conferences

Memory-based heuristic techniques have been used to effectively reduce search times in implicit graphs. Recently, these techniques have been applied to improving search times in explicit graphs. This paper presents a new memory-based, additive heuristic that can be used on a type of explicit graph: the four-connected gridworld. The heuristic reduces the number of expanded nodes by up to five times, reduces execution time by up to 29 times, and can efficiently accommodate graph changes.