Behavior Alignment via Reward Function Optimization Dhawal Gupta University of Massachusetts Y ash Chandak
–Neural Information Processing Systems
Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task.
Neural Information Processing Systems
Feb-16-2026, 08:13:30 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe > Romania (0.04)
- North America
- Canada > Alberta (0.14)
- United States
- California > San Diego County
- San Diego (0.04)
- Massachusetts (0.40)
- Michigan (0.04)
- California > San Diego County
- Oceania
- Australia > New South Wales
- Sydney (0.04)
- New Zealand > North Island
- Auckland Region > Auckland (0.04)
- Australia > New South Wales
- Asia > Middle East
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Energy (0.46)
- Technology: