Learning Probabilistic Temporal Safety Properties from Examples in Relational Domains

Rens, Gavin, Yang, Wen-Chi, Raskin, Jean-François, De Raedt, Luc

arXiv.org Artificial Intelligence 

Many recent publications report on methods for achieving safety in Markov Decision Processes (MDPs), where temporal logic (safety) specifications must be satisfied [1-4]. However, it is typically assumed that 1) the safety specification is given, and 2) that the states in the underlying MDP are unstructured. In this paper, we are interested in 1) learning the safety specification from examples, and 2) working with relational MDPs. More specifically, in our learning setting we assume that there is a domain expert who is presented with a set of system states E, a probability threshold α and a step-bound k (number of action executions). If the expert believes that the system, starting in s E will perform actions that lead to a dangerous temporal situation within k steps with probability at least α, then she will label s as dangerous, else, as safe. Now, given this set E of labeled states, we want to learn a compact temporal logic formula summarizing the expert's advice. There are at least three reasons to infer a property (expressed as a temporal logic formula) from an expert's advice. Firstly, to obtain a concise, human-interpretable expression of some aspects of the domain [5-7], secondly, to verify a system's control behavior (policy) w.r.t. a set of (safety) standards [6, 8] and thirdly, to use the (safety) property to devise strategies for the system or agent to avoid undesirable situations [8-10]. Furthermore, we consider systems that can be modelled as relational MDPs (RMDPs).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found