Safe Reinforcement Learning via Shielding

Alshiekh, Mohammed (University of Texas at Austin) | Bloem, Roderick (Graz University of Technology) | Ehlers, Rüdiger (University of Bremen and DFKI GmbH) | Könighofer, Bettina (Graz University of Technology, Institute for Applied Information Processing and Communications) | Niekum, Scott (University of Texas at Austin) | Topcu, Ufuk (University of Texas at Austin)

Feb-8-2018–AAAI Conferences

Reinforcement learning algorithms discover policies that maximize reward, but do not necessarily guarantee safety during learning or execution phases. We introduce a new approach to learn optimal policies while enforcing properties expressed in temporal logic. To this end, given the temporal logic specification that is to be obeyed by the learning system, we propose to synthesize a reactive system called a shield. The shield monitors the actions from the learner and corrects them only if the chosen action causes a violation of the specification. We discuss which requirements a shield must meet to preserve the convergence guarantees of the learner. Finally, we demonstrate the versatility of our approach on several challenging reinforcement learning scenarios.

artificial intelligence, reinforcement learning, shield, (18 more...)

AAAI Conferences

Feb-8-2018

Conferences PDF

Add feedback

Country:
- Europe > Germany
  - Bremen > Bremen (0.14)
- North America > United States
  - Texas > Travis County > Austin (0.14)

Industry:
- Leisure & Entertainment (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found