Direct Preference-based Policy Optimization without Reward Modeling

Nov-19-2025, 22:18:48 GMT–Neural Information Processing Systems

Instead, we propose a PbRL algorithm that directly learns from preference without requiring any reward modeling.

large language model, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Nov-19-2025, 22:18:48 GMT

Conferences PDF

Country:
- Asia > South Korea > Seoul > Seoul (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (0.68)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Neural Networks > Deep Learning (0.94)

Duplicate Docs Excel Report

Title
de8bd6b2b01cfa788e63f62e5b9a99b9-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found