Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning Joseph Early Tom Bewley Christine Evers
–Neural Information Processing Systems
In this work, we remove this assumption, extending RM to capture temporal dependencies in human assessment of trajectories.
Neural Information Processing Systems
Aug-17-2025, 20:21:13 GMT