Constrained Edge AI Deployment: Fine-Tuning vs Distillation for LLM Compression

Sander, Jacob, Moe, David, Cohen, Achraf, Venable, Brent, Dasari, Venkat, Jalaian, Brian

May-27-2025–arXiv.org Artificial Intelligence

Modern foundational models are often compressed via a combination of structured pruning and re-training to meet the strict compute, memory, and connectivity constraints of edge deployments. While state-of-the-art pruning schemes target the entire Transformer, we adopt a simple, layer-wise L2-norm pruning on only the MLP blocks as a fixed baseline. Our focus is not on achieving maximal compression, but on isolating the impact of the re-training loss function: (i) Fine-tuning with Cross- Entropy (L2PFT), which requires labeled data, versus (ii) Self-Distillation with KL-divergence, which leverages only teacher logits (no labels) (L2PSD). We evaluate both pipelines on the OLMo2- 7B-SFT model for CommonsenseQA suitable for intermittent or denied connectivity scenarios typical of edge networks. Under identical pruning schedules, KL-based distillation matches or exceeds CE fine-tuning in test accuracy, demonstrating that, even with a basic MLP-only pruning, the choice of loss function materially affects compressed model recovery in resource-constrained environments.

large language model, machine learning, pruning, (18 more...)

arXiv.org Artificial Intelligence

May-27-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
  - Florida > Escambia County
    - Pensacola (0.05)
- Europe > Italy
  - Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (1.00)

Industry:
- Education (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.94)
  - Natural Language > Large Language Model (0.71)
  - Machine Learning > Neural Networks
    - Deep Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found