Adaptive Preference Scaling for Reinforcement Learning with Human Feedback

Feb-17-2026, 23:21:17 GMT–Neural Information Processing Systems

In this paper, we propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO), designed to address this uncertainty in preference strength.

large language model, machine learning, reward difference, (21 more...)

Neural Information Processing Systems

Feb-17-2026, 23:21:17 GMT

Conferences PDF

Country:
- North America > United States > Texas > Brazos County > College Station (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.93)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (0.68)
    - Reinforcement Learning (0.65)

Duplicate Docs Excel Report

Title
c1f66abb52467443ba8fc70e0a32e061-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found