SimPO: Simple Preference Optimization with a Reference-Free Reward Y u Meng
–Neural Information Processing Systems
Additionally, we introduce a target reward margin to the Bradley-Terry objective to encourage a larger margin between the winning and losing responses, further improving the algorithm's performance.
Neural Information Processing Systems
Oct-10-2025, 19:12:16 GMT
- Country:
- Europe > Italy
- Calabria > Catanzaro Province
- Catanzaro (0.04)
- Tuscany > Florence (0.04)
- Calabria > Catanzaro Province
- North America > United States
- Texas > Travis County
- Austin (0.04)
- Virginia (0.04)
- Texas > Travis County
- South America
- Europe > Italy
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.93)
- Research Report
- Industry:
- Education (0.45)
- Technology: