SGPO: Self-Generated Preference Optimization based on Self-Improver

Open in new window