Synthesising Reinforcement Learning Policies through Set-Valued Inductive Rule Learning

Coppens, Youri, Steckelmacher, Denis, Jonker, Catholijn M., Nowé, Ann

Jun-10-2021–arXiv.org Artificial Intelligence

Today's advanced Reinforcement Learning algorithms produce black-box policies, that are often difficult to interpret and trust for a person. We introduce a policy distilling algorithm, building on the CN2 rule mining algorithm, that distills the policy into a rule-based decision system. At the core of our approach is the fact that an RL process does not just learn a policy, a mapping from states to actions, but also produces extra meta-information, such as action values indicating the quality of alternative actions. This meta-information can indicate whether more than one action is near-optimal for a certain state. We extend CN2 to make it able to leverage knowledge about equally-good actions to distill the policy into fewer rules, increasing its interpretability by a person. Then, to ensure that the rules explain a valid, non-degenerate policy, we introduce a refinement algorithm that fine-tunes the rules to obtain good performance when executed in the environment. We demonstrate the applicability of our algorithm on the Mario AI benchmark, a complex task that requires modern reinforcement learning algorithms including neural networks. The explanations we produce capture the learned policy in only a few rules, that allow a person to understand what the black-box agent learned.

agent, algorithm, probability, (13 more...)

arXiv.org Artificial Intelligence

Jun-10-2021

arXiv.org PDF

Add feedback

Country:
- Asia > Macao (0.04)
- North America
  - United States
    - Massachusetts > Middlesex County
      - Cambridge (0.04)
    - California > Santa Clara County
      - Palo Alto (0.04)
  - Canada
    - British Columbia > Vancouver Island
      - Capital Regional District > Victoria (0.04)
    - Alberta > Census Division No. 15
      - Improvement District No. 9 > Banff (0.04)
- Europe
  - Switzerland (0.04)
  - Netherlands > South Holland
    - Leiden (0.04)
    - Delft (0.04)
  - Germany > North Rhine-Westphalia
    - Cologne Region > Aachen (0.04)
  - Belgium
    - Brussels-Capital Region > Brussels (0.04)
    - Flanders (0.04)

Genre:
- Research Report (0.50)

Industry:
- Information Technology > Security & Privacy (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning
    - Rule-Based Reasoning (1.00)
    - Uncertainty > Fuzzy Logic (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found