The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants

Zhang, Yiqun, Li, Hao, Wang, Chenxu, Chen, Linyao, Zhang, Qiaosheng, Ye, Peng, Feng, Shi, Wang, Daling, Wang, Zhen, Wang, Xinrun, Xu, Jia, Bai, Lei, Ouyang, Wanli, Hu, Shuyue

Jun-19-2025–arXiv.org Artificial Intelligence

Proprietary giants are increasingly dominating the race for ever-larger language models. Can open-source, smaller models remain competitive across a broad range of tasks? In this paper, we present the Avengers -- a simple recipe that leverages the collective intelligence of these smaller models. The Avengers builds upon four lightweight operations: (i) embedding: encode queries using a text embedding model; (ii) clustering: group queries based on their semantic similarity; (iii) scoring: scores each model's performance within each cluster; and (iv) voting: improve outputs via repeated sampling and voting. At inference time, each query is embedded and assigned to its nearest cluster. The top-performing model(s) within that cluster are selected to generate the response with repeated sampling. Remarkably, with 10 open-source models (~7B parameters each), the Avengers surpasses GPT-4o, 4.1, and 4.5 in average performance across 15 diverse datasets spanning mathematics, coding, logical reasoning, general knowledge, and affective tasks. In particular, it surpasses GPT-4.1 on mathematics tasks by 18.21% and on code tasks by 7.46%. Furthermore, the Avengers delivers superior out-of-distribution generalization, and remains robust across various embedding models, clustering algorithms, ensemble strategies, and values of its sole parameter -- the number of clusters.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

Jun-19-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.92)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Education (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.90)
  - Machine Learning
    - Statistical Learning > Clustering (1.00)
    - Neural Networks > Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found