On Monotonicity in AI Alignment

Bareilles, Gilles, Fageot, Julien, Hoang, Lê-Nguyên, Blanchard, Peva, Bouaziz, Wassim, Rouault, Sébastien, El-Mhamdi, El-Mahdi

Jun-17-2025–arXiv.org Machine Learning

Comparison-based preference learning has become central to the alignment of AI models with human preferences. However, these methods may behave counterintuitively. After empirically observing that, when accounting for a preference for response $y$ over $z$, the model may actually decrease the probability (and reward) of generating $y$ (an observation also made by others), this paper investigates the root causes of (non) monotonicity, for a general comparison-based preference learning framework that subsumes Direct Preference Optimization (DPO), Generalized Preference Optimization (GPO) and Generalized Bradley-Terry (GBT). Under mild assumptions, we prove that such methods still satisfy what we call local pairwise monotonicity. We also provide a bouquet of formalizations of monotonicity, and identify sufficient conditions for their guarantee, thereby providing a toolbox to evaluate how prone learning models are to monotonicity violations. These results clarify the limitations of current methods and provide guidance for developing more trustworthy preference learning algorithms.

large language model, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

Jun-17-2025

arXiv.org PDF

Add feedback

Country:
- Oceania > Palau (0.04)
- Africa > Ghana (0.04)
- North America
  - United States
    - Washington > King County
      - Seattle (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - California > Los Angeles County
      - Long Beach (0.04)
  - Canada > British Columbia
    - Vancouver (0.05)
- Europe
  - Austria > Vienna (0.14)
  - Spain (0.04)
  - Czechia > Prague (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (0.93)
    - Chatbot (0.67)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found