Universal Model Routing for Efficient LLM Inference
Jitkrittum, Wittawat, Narasimhan, Harikrishna, Rawat, Ankit Singh, Juneja, Jeevesh, Wang, Zifeng, Lee, Chen-Yu, Shenoy, Pradeep, Panigrahy, Rina, Menon, Aditya Krishna, Kumar, Sanjiv
–arXiv.org Artificial Intelligence
Large language models' significant advances in capabilities are accompanied by significant increases in inference costs. Model routing is a simple technique for reducing inference cost, wherein one maintains a pool of candidate LLMs, and learns to route each prompt to the smallest feasible LLM. Existing works focus on learning a router for a fixed pool of LLMs. In this paper, we consider the problem of dynamic routing, where new, previously unobserved LLMs are available at test time. We propose a new approach to this problem that relies on representing each LLM as a feature vector, derived based on predictions on a set of representative prompts. Based on this, we detail two effective strategies, relying on cluster-based routing and a learned cluster map respectively. We prove that these strategies are estimates of a theoretically optimal routing rule, and provide an excess risk bound to quantify their errors. Experiments on a range of public benchmarks show the effectiveness of the proposed strategies in routing amongst more than 30 unseen LLMs.
arXiv.org Artificial Intelligence
Feb-12-2025
- Country:
- Asia
- British Indian Ocean Territory > Diego Garcia (0.04)
- Japan > Honshū
- Chūbu > Aichi Prefecture > Nagoya (0.04)
- Middle East
- Jordan (0.04)
- Saudi Arabia > Asir Province
- Abha (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Europe
- Denmark > Capital Region
- Copenhagen (0.04)
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- Switzerland (0.04)
- Denmark > Capital Region
- North America
- Canada > Ontario
- Toronto (0.04)
- Mexico
- Mexico City > Mexico City (0.04)
- Quintana Roo > Cancún (0.04)
- United States
- Arizona > Maricopa County
- Scottsdale (0.04)
- California > Monterey County
- Monterey (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- New York > New York County
- New York City (0.04)
- Arizona > Maricopa County
- Canada > Ontario
- Oceania > Australia
- Asia
- Genre:
- Research Report (0.65)
- Technology: