C-Evolve: Consensus-based Evolution for Prompt Groups
Li, Tiancheng, Wang, Yuhang, Chen, Zhiyang, Wang, Zijun, Ma, Liyuan, Qi, Guo-jun
–arXiv.org Artificial Intelligence
Prompt evolution algorithms offer a powerful paradigm for enhancing AI systems based on closed-source models, while few work explores whether aggregating results from multiple prompts to reach a consensus can further advance the system capability boundary. In this paper, we introduce Consensus-Evolve (C-Evolve), an evolutionary algorithm that discovers a group of prompts whose aggregated outputs after majority voting achieve optimal performance. More specifically, C-Evolve employs an island-based evolutionary algorithm to maintain population diversity, and prompts from distinct islands are selected to form groups to aggregate their outputs. The key difference from single individual evolution is a voting score, which evaluates each individual prompt's contribution within groups. We take this as the fitness score for evolution instead of individual performance. Consequently, C-Evolve is more likely to produce and maintain prompts with higher potential to form a high-performing group and eliminate low-performing ones, gradually improving the group performance after reaching consensus. Our method achieves state-of-the-art performance across a wide range of tasks, including both open-ended tasks like HotpotQA and closed-ended tasks like MATH. On Qwen3-8B, C-Evolve achieves 70.67% on HotpotQA and 43.88% on IFBench, which are 4.95% and 2.73% higher than GEPA, respectively. For GPT-4.1-mini, the accuracy on IFBench is further improved to 47.96% and reaches 95.33% in the MATH benchmark. These results demonstrate the C-Evolve's competitive performance.
arXiv.org Artificial Intelligence
Sep-30-2025
- Country:
- North America > United States (0.92)
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Education (0.87)
- Leisure & Entertainment > Sports (0.67)
- Technology: