STAIR: Addressing Stage Misalignment through Temporal-Aligned Preference Reinforcement Learning

Open in new window