Parker, David
Nested Value Iteration for Partially Satisfiable Co-Safe LTL Specifications (Extended Abstract)
Lacerda, Bruno (University of Birmingham) | Parker, David (University of Birmingham) | Hawes, Nick (University of Birmingham)
We describe our recent work on cost-optimal policy generation, for co-safe linear temporal logic (LTL) specifications that are not satisfiable with probability one in a Markov decision process (MDP) model. We provide an overview of the approach to pose the problem as the optimisation of three standard objectives in a trimmed product MDP. Furthermore, we introduce a new approach for optimising the three objectives, in a decreasing order of priority, based on a “nested” value iteration, where one value table is kept for each objective.
Optimal Policy Generation for Partially Satisfiable Co-Safe LTL Specifications
Lacerda, Bruno (University of Birmingham) | Parker, David (University of Birmingham) | Hawes, Nick (University of Birmingham)
We present a method to calculate cost-optimal policies for co-safe linear temporal logic task specifications over a Markov decision process model of a stochastic system. Our key contribution is to address scenarios in which the task may not be achievable with probability one. We formalise a task progression metric and, using multi-objective probabilistic model checking, generate policies that are formally guaranteed to, in decreasing order of priority: maximise the probability of finishing the task; maximise progress towards completion, if this is not possible; and minimise the expected time or cost required. We illustrate and evaluate our approach in a robot task planning scenario, where the task is to visit a set of rooms that may be inaccessible during execution.