Mechanism Design for LLM Fine-tuning with Multiple Reward Models

Open in new window