Behavior Alignment via Reward Function Optimization

Dec-26-2025, 12:12:27 GMT–Neural Information Processing Systems

Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task.This is challenging since it requires the identification of reward structures that are not sparse and that avoid inadvertently inducing undesirable behaviors. Naively modifying the reward structure to offer denser and more frequent feedback can lead to unintended outcomes and promote behaviors that are not aligned with the designer's intended goal. Although potential-based reward shaping is often suggested as a remedy, we systematically investigate settings where deploying it often significantly impairs performance. To address these issues, we introduce a new framework that uses a bi-level objective to learn \emph{behavior alignment reward functions}. These functions integrate auxiliary rewards reflecting a designer's heuristics and domain knowledge with the environment's primary rewards.

behavior alignment, name change, reward function optimization, (5 more...)

Neural Information Processing Systems

Dec-26-2025, 12:12:27 GMT

Conferences Web Page

Add feedback

Country:
- Asia > Middle East > Jordan (0.07)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.75)