Towards Bridging the Reward-Generation Gap in Direct Alignment Algorithms