FlipGuard: Defending Preference Alignment against Update Regression with Constrained Optimization

Open in new window