Explaining Reward Functions in Markov Decision Processes

Russell, Jacob (Dartmouth College) | Santos, Eugene (Dartmouth College)

AAAI Conferences 

Rewards in Markov Decision Processes (MDP) define the behavior of the model. Without a clear interpretation of what the reward function is and is not capturing, one cannot trust their model nor diagnose when the model is giving incorrect recommendations. Increasing complexity of state-of-the-art models used to represent the reward function and model-free methods that attempt to avoid representing this function make trusting the model much more difficult. We map these reward functions onto a standard classification problem where we can explain what factors the model considers in making decisions in local and global contexts and quantify whether the fit of the reward function is likely to be good for explaining the behavior of the model. We evaluate our proof-of-concept on both the standard version and a modified version of the Object World domain to add more nonlinearity.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found