Informed Routing in LLMs: Smarter Token-Level Computation for Faster Inference

Han, Chao, Liang, Yijuan, Xuan, Zihao, Wu, Daokuan, Zhang, Wei, Shen, Xiaoyu

Oct-17-2025–arXiv.org Artificial Intelligence

The deployment of large language models (LLMs) in real-world applications is increasingly limited by their high inference cost. While recent advances in dynamic token-level computation allocation attempt to improve efficiency by selectively activating model components per token, existing methods rely on greedy routing--a myopic execute-or-skip mechanism that often leads to irreversible information loss and suboptimal token selection. This paper introduces informed routing, a new paradigm that proactively addresses these issues. The key insight is to assess not only a token's immediate importance but also its recoverability, i.e., how well its transformation can be approximated. To this end, we propose the Lightweight Feature Forecaster (LFF), a small predictive module that estimates a unit's output before routing decisions are made. This enables a flexible execute-or-approximate policy that preserves model fidelity while drastically reducing computation. Extensive experiments on both language modeling and reasoning tasks show that informed routing achieves state-of-the-art efficiency-performance trade-offs across multiple sparsity levels. Notably, even without final LoRA fine-tuning, our method matches or surpasses strong baselines that require full fine-tuning, all while reducing training time by over 50%. The emergence of large language models (LLMs) has catalyzed breakthroughs across diverse industries (Su et al., 2022; OpenAI et al., 2024; Rozi ` ere et al., 2024; Cai et al., 2025; Zheng et al., 2025).

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Oct-17-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)
- Europe (0.28)
- Asia > China (0.28)

Genre:
- Research Report > New Finding (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found