Iterative Foundation Model Fine-Tuning on Multiple Rewards

Ghari, Pouya M., Sciabola, Simone, Wang, Ye

Nov-4-2025–arXiv.org Artificial Intelligence

Fine-tuning foundation models has emerged as a powerful approach for generating objects with specific desired properties. Reinforcement learning (RL) provides an effective framework for this purpose, enabling models to generate outputs that maximize a given reward function. However, in many applications such as text generation and drug discovery, it can be suboptimal to optimize using a single reward signal, as multiple evaluation criteria are often necessary. This paper proposes a novel reinforcement learning-based method for fine-tuning foundation models using multiple reward signals. By employing an iterative fine-tuning strategy across these rewards, our approach generalizes state-of-the-art RL-based methods. We further provide a theoretical analysis that offers insights into the performance of multi-reward RL fine-tuning. Experimental results across diverse domains including text, biological sequence, and small molecule generation, demonstrate the effectiveness of the proposed algorithm compared to state-of-the-art baselines.

large language model, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

Nov-4-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.93)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.89)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.70)
  - Representation & Reasoning > Optimization (0.67)
  - Machine Learning
    - Reinforcement Learning (0.87)
    - Neural Networks > Deep Learning (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found