Grounded ReinforcementLearning: LearningtoWintheGameunderHumanCommands
–Neural Information Processing Systems
From the RL perspective, it is extremely challenging to derive a precise rewardfunction forhuman preferences since thecommands areabstract and the valid behaviors are highly complicated and multi-modal.
Neural Information Processing Systems
Feb-8-2026, 05:06:40 GMT