FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference

Du, Hongchao, Wu, Shangyu, Kharlamova, Arina, Guan, Nan, Xue, Chun Jason

Mar-4-2025–arXiv.org Artificial Intelligence

Although these approaches can improve models' Large Language Models (LLMs) face challenges for on-device memory efficiency, they inevitably impact the generality inference due to high memory demands. Traditional methods performance and still suffer in extreme resource-constrained to reduce memory usage often compromise performance scenarios [4, 9, 12]. Furthermore, these methods lack the flexibility and lack adaptability. We propose FlexInfer, an optimized to vary memory budgets or deployment constraints, offloading framework for on-device inference, addressing requiring adjusting the hyper-parameters, such as quantization these issues with techniques like asynchronous prefetching, or sparsity levels, offering limited choices, and imposing balanced memory locking, and flexible tensor preservation.

flexinfer, wang, zhang, (13 more...)

arXiv.org Artificial Intelligence

Mar-4-2025

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States
  - District of Columbia > Washington (0.05)
  - Texas > Travis County
    - Austin (0.04)
  - New York > New York County
    - New York City (0.04)
  - Hawaii > Honolulu County
    - Honolulu (0.04)
  - Florida > Miami-Dade County
    - Miami (0.04)
  - California
    - San Diego County > Carlsbad (0.04)
    - Los Angeles County > Long Beach (0.04)
- Europe
  - Monaco (0.04)
  - Germany (0.04)
- Asia
  - China > Hong Kong (0.05)
  - British Indian Ocean Territory > Diego Garcia (0.04)
  - Middle East > UAE
    - Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre:
- Research Report (1.00)

Industry:
- Information Technology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.95)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found