AlphaPO -- Reward shape matters for LLM alignment

Open in new window