Grounded ReinforcementLearning: LearningtoWintheGameunderHumanCommands

Neural Information Processing Systems 

From the RL perspective, it is extremely challenging to derive a precise rewardfunction forhuman preferences since thecommands areabstract and the valid behaviors are highly complicated and multi-modal.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found