Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning Joseph Early Tom Bewley Christine Evers

Neural Information Processing Systems 

In this work, we remove this assumption, extending RM to capture temporal dependencies in human assessment of trajectories.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found