Energy-Efficient Vision Transformer Inference for Edge-AI Deployment

Dec-1-2025–arXiv.org Artificial Intelligence

Abstract--The growing deployment of Vision Transformers (ViTs) on energy-constrained devices requires evaluation methods that go beyond accuracy alone. We present a two-stage pipeline for assessing ViT energy efficiency that combines device-agnostic model selection with device-related measurements. The device-agnostic stage uses the NetScore metric for screening; the device-related stage ranks models with the Sustainable Accuracy Metric (SAM). Results show that hybrid models such as LeViT_Conv_192 reduce energy by up to 53% on TX2 relative to a ViT baseline (e.g., SAM5=1.44 on TX2/CIF AR-10), while distilled models such as TinyViT-11M_Distilled excel on the mobile GPU (e.g., SAM5=1.72 on RTX 3050/CIF AR-10 and SAM5=0.76 on RTX 3050/ImageNet-1K). ECENTL Y, Vision Transformers (ViTs) have emerged as the state-of-the-art in many of computer vision tasks, from image classification to object detection [1].

artificial intelligence, distilled, rtx 3050, (13 more...)

arXiv.org Artificial Intelligence

Dec-1-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Kazakhstan > Akmola Region > Astana (0.04)

Genre:
- Research Report > New Finding (0.48)

Industry:
- Energy (0.94)

Technology:
- Information Technology > Artificial Intelligence > Vision (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found