Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment

Open in new window