Leveraging Domain Knowledge for Efficient Reward Modelling in RLHF: A Case-Study in E-Commerce Opinion Summarization

Nath, Swaroop, Siledar, Tejpalsingh, Muddu, Sankara Sri Raghava Ravindra, Rangaraju, Rupasai, Khadilkar, Harshad, Bhattacharyya, Pushpak, Banerjee, Suman, Patil, Amey, Singh, Sudhanshu Shekhar, Chelliah, Muthusamy, Garera, Nikesh

Apr-18-2024–arXiv.org Artificial Intelligence

Reinforcement Learning from Human Feedback (RLHF) has become a dominating strategy in aligning Language Models (LMs) with human values/goals. The key to the strategy is learning a reward model ($\varphi$), which can reflect the latent reward model of humans. While this strategy has proven effective, the training methodology requires a lot of human preference annotation (usually in the order of tens of thousands) to train $\varphi$. Such a large-scale annotation is justifiable when it's a one-time effort, and the reward model is universally applicable. However, human goals are subjective and depend on the task, requiring task-specific preference annotations, which can be impractical to fulfill. To address this challenge, we propose a novel approach to infuse domain knowledge into $\varphi$, which reduces the amount of preference annotation required ($21\times$), omits Alignment Tax, and provides some interpretability. We validate our approach in E-Commerce Opinion Summarization, with a significant reduction in dataset size (to just $940$ samples) while advancing the SOTA ($\sim4$ point ROUGE-L improvement, $68\%$ of times preferred by humans over SOTA). Our contributions include a novel Reward Modeling technique and two new datasets: PromptOpinSumm (supervised data for Opinion Summarization) and OpinPref (a gold-standard human preference dataset). The proposed methodology opens up avenues for efficient RLHF, making it more adaptable to applications with varying human values. We release the artifacts (Code: github.com/efficient-rlhf. PromptOpinSumm: hf.co/prompt-opin-summ. OpinPref: hf.co/opin-pref) for usage under MIT License.

benchmark, dataset, evaluation, (13 more...)

arXiv.org Artificial Intelligence

Apr-18-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Washington > King County
      - Seattle (0.04)
    - New York > New York County
      - New York City (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Asia
  - Singapore (0.04)
  - India > Maharashtra
    - Mumbai (0.04)

Genre:
- Research Report (1.00)
- Overview (0.88)

Industry:
- Education (0.46)
- Information Technology > Services
  - e-Commerce Services (0.61)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found