LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization

Wu, Yuanchen, Verma, Saurabh, Lee, Justin, Xiong, Fangzhou, Zhang, Poppy, Awadelkarim, Amel, Chen, Xu, Yuan, Yubai, Hill, Shawndra

Oct-17-2025–arXiv.org Machine Learning

Large language models (LLMs) are highly sensitive to their input prompts, making prompt design a central challenge. While automatic prompt optimization (APO) reduces manual engineering, most approaches assume access to ground-truth references such as labeled validation data. In practice, however, collecting high-quality labels is costly and slow. We propose the Prompt Duel Optimizer (PDO), a sample-efficient framework for label-free prompt optimization. PDO formulates the problem as a dueling-bandit setting, where supervision signal comes from pairwise preference feedback provided by an LLM judge. The framework combines Double Thompson Sampling (D-TS), which prioritizes informative prompt comparisons, with Top-Performer Guided Mutation, which expands the candidate pool by mutating high-performing prompts. PDO naturally operates in label-free settings and can also incorporate partial labels to mitigate judge noise. Experiments on BIG-bench Hard (BBH) and MS MARCO show that PDO consistently outperforms baseline methods. Ablation studies further demonstrate the effectiveness of both D-TS and prompt mutation.

large language model, natural language, optimization, (14 more...)

arXiv.org Machine Learning

Oct-17-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)
- Europe > Austria (0.28)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found