D-LLM: A Token Adaptive Computing Resource Allocation Strategy for Large Language Models

May-26-2025, 14:58:22 GMT–Neural Information Processing Systems

Large language models have shown an impressive societal impact owing to their excellent understanding and logical reasoning skills. However, such strong ability relies on a huge amount of computing resources, which makes it difficult to deploy LLMs on computing resource-constrained platforms. Currently, LLMs process each token equivalently, but we argue that not every word is equally important. Some words should not be allocated excessive computing resources, particularly for dispensable terms in simple questions. In this paper, we propose a novel dynamic inference paradigm for LLMs, namely D-LLMs, which adaptively allocate computing resources in token processing.

artificial intelligence, large language model, natural language, (3 more...)

Neural Information Processing Systems

May-26-2025, 14:58:22 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)