Modular Deep Reinforcement Learning with Temporal Logic Specifications

Yuan, Lim Zun, Hasanbeig, Mohammadhosein, Abate, Alessandro, Kroening, Daniel

Sep-23-2019–arXiv.org Artificial Intelligence

We propose an actor-critic, model-free, and online Reinforcement Learning (RL) framework for continuous-state continuous-action Markov Decision Processes (MDPs) when the reward is highly sparse but encompasses a high-level temporal structure. We represent this temporal structure by a finite-state machine and construct an on-the-fly synchronised product with the MDP and the finite machine. The temporal structure acts as a guide for the RL agent within the product, where a modular Deep Deterministic Policy Gradient (DDPG) architecture is proposed to generate a low-level control policy. We evaluate our framework in a Mars rover experiment and we present the success rate of the synthesised policy.

agent, algorithm, automaton state, (13 more...)

arXiv.org Artificial Intelligence

Sep-23-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Arizona (0.04)
  - Massachusetts > Hampshire County
    - Amherst (0.04)
- Europe > United Kingdom
  - England > Oxfordshire > Oxford (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found