Reviews: Learning Task Specifications from Demonstrations

Neural Information Processing Systems 

These specifications can be seen as non-Markovian reward functions. Thus, this work is related to inverse reinforcement learning (IRL) which aims to infer the reward function of an agent by observing these successive states and actions. By defining the probability of a trajectory knowing a specification (using the maximum entropy principle) the development leads to a posterior distribution. Two algorithms result from this and allow to test the approach on the system presented in introduction (motivating the paper).