Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization

Open in new window