Behavior Alignment via Reward Function Optimization Dhawal Gupta University of Massachusetts Y ash Chandak

Neural Information Processing Systems 

Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found