LaSeR: Reinforcement Learning with Last-Token Self-Rewarding

Open in new window