Inverse Preference Learning: Preference-based RL without a Reward Function

Neural Information Processing Systems 

Reward functions are difficult to design and often hard to align with human intent.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found