Cost-Aware Contrastive Routing for LLMs
Shirkavand, Reza, Gao, Shangqian, Yu, Peiran, Huang, Heng
–arXiv.org Artificial Intelligence
We study cost-aware routing for large language models across diverse and dynamic pools of models. Existing approaches often overlook prompt-specific context, rely on expensive model profiling, assume a fixed set of experts, or use inefficient trial-and-error strategies. We introduce Cost-Spectrum Contrastive Routing (CSCR), a lightweight framework that maps both prompts and models into a shared embedding space to enable fast, cost-sensitive selection. CSCR uses compact, fast-to-compute logit footprints for open-source models and perplexity fingerprints for black-box APIs. A contrastive encoder is trained to favor the cheapest accurate expert within adaptive cost bands. At inference time, routing reduces to a single k-NN lookup via a FAISS index, requiring no retraining when the expert pool changes and enabling microsecond latency. Across multiple benchmarks, CSCR consistently outperforms baselines, improving the accuracy-cost tradeoff by up to 25%, while generalizing robustly to unseen LLMs and out-of-distribution prompts.
arXiv.org Artificial Intelligence
Nov-25-2025
- Country:
- Africa > Rwanda
- Asia
- Japan > Honshū
- Chūbu > Aichi Prefecture > Nagoya (0.04)
- Middle East > Jordan (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Japan > Honshū
- Europe > France (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- Mexico > Quintana Roo
- Cancún (0.04)
- United States
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Maryland > Prince George's County
- College Park (0.04)
- Texas (0.04)
- Louisiana > Orleans Parish
- Canada > Ontario
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Education > Curriculum
- Subject-Specific Education (1.00)
- Law (0.67)
- Education > Curriculum
- Technology: