Symbolic control techniques aim to satisfy complex logic specifications. A critical step in these techniques is the construction of a symbolic (discrete) abstraction, a finite-state system whose behaviour mimics that of a given continuous-state system. The methods used to compute symbolic abstractions, however, require knowledge of an accurate closed-form model. To generalize them to systems with unknown dynamics, we present a new data-driven approach that does not require closed-form dynamics, instead relying only the ability to evaluate successors of each state under given inputs. To provide guarantees for the learned abstraction, we use the Probably Approximately Correct (PAC) statistical framework. We first introduce a PAC-style behavioural relationship and an appropriate refinement procedure. We then show how the symbolic abstraction can be constructed to satisfy this new behavioural relationship. Moreover, we provide PAC bounds that dictate the number of data required to guarantee a prescribed level of accuracy and confidence. Finally, we present an illustrative example.
Counterexample-guided abstraction refinement (CEGAR) is a method for incrementally computing abstractions of transition systems. We propose a CEGAR algorithm for computing abstraction heuristics for optimal classical planning. Starting from a coarse abstraction of the planning task, we iteratively compute an optimal abstract solution, check if and why it fails for the concrete planning task and refine the abstraction so that the same failure cannot occur in future iterations. A key ingredient of our approach is a novel class of abstractions for classical planning tasks that admits efficient and very fine-grained refinement. Since a single abstraction usually cannot capture enough details of the planning task, we also introduce two methods for producing diverse sets of heuristics within this framework, one based on goal atoms, the other based on landmarks. In order to sum their heuristic estimates admissibly we introduce a new cost partitioning algorithm called saturated cost partitioning. We show that the resulting heuristics outperform other state-of-the-art abstraction heuristics in many benchmark domains.
This discussion on'The Abstractionism of probability" is perhaps one of the first in the world to be discussed publicly. It has to be understood that, this discussion has evolved out of various other discussions with mathematicians, philosophers, doctors and engineers and many other participants including rappers, mainstream musicians, artists, actors and actresses. Because this was the subject matter for a documentary, to keep its serenity and purity, no filmmakers of any kind were interviewed. The film is in the making.
Solutions to most complex tasks can be decomposed into simpler, intermediate skills, reusable across wider ranges of problems. We follow this concept and introduce Hindsight Off-policy Options (HO2), a new algorithm for efficient and robust option learning. The algorithm relies on critic-weighted maximum likelihood estimation and an efficient dynamic programming inference procedure over off-policy trajectories. We can backpropagate through the inference procedure through time and the policy components for every time-step, making it possible to train all component's parameters off-policy, independently of the data-generating behavior policy. Experimentally, we demonstrate that HO2 outperforms competitive baselines and solves demanding robot stacking and ball-in-cup tasks from raw pixel inputs in simulation. We further compare autoregressive option policies with simple mixture policies, providing insights into the relative impact of two types of abstractions common in the options framework: action abstraction and temporal abstraction. Finally, we illustrate challenges caused by stale data in off-policy options learning and provide effective solutions.