SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling

Open in new window