Efficient Resource-Constrained Training of Vision Transformers via Subspace Optimization

Nguyen, Le-Trung, Tartaglione, Enzo, Nguyen, Van-Tam

arXiv.org Artificial Intelligence 

As AI increasingly shapes daily life, energy consumption and data privacy have become pressing concerns. However, the expanding scale of modern neural networks creates a major obstacle for on-device training. Although prior work has concentrated on compact convolutional architectures, we instead apply subspace-based training to transformer models. Motivated by the idea that a model's essential information lies in a fixed subspace, we introduce Weight-Activation Subspace Iteration (W ASI), a method that mitigates the memory bottleneck of backpropagation and boosts inference efficiency in transformer models by restricting training to this subspace. Our results demonstrate that W ASI maintains accuracy comparable to vanilla training while reducing memory usage by up to 62 and computational cost (FLOPs) by up to 2 . On a Raspberry Pi 5, W ASI achieves roughly 1.5 faster training and inference than vanilla training. On-device learning has recently emerged as a promising research direction, enabling deep learning models to be fine-tuned directly on resource-constrained edge devices. This approach addresses critical issues such as privacy and energy consumption, improves scalability, and places control of AI capabilities directly "in user's hands" (Dhar et al., 2021). Prior work on on-device learning has largely focused on vision tasks using convolutional neural network models, primarily because of their compact architectures (Lin et al., 2022; Nguyen et al., 2024; Y ang et al., 2023b; Qu elennec et al., 2024; Bragagnolo et al., 2022; Nguyen et al., 2025). In many real-world applications, however, transformer-based models have become the de facto choice due to their unique architectural mechanisms (V aswani et al., 2017).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found