RLRC: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models
–arXiv.org Artificial Intelligence
--Vision-Language-Action models (VLA) have demonstrated remarkable capabilities and promising potential in solving complex robotic manipulation tasks. However, their substantial parameter sizes and high inference latency pose significant challenges for real-world deployment, particularly on resource-constrained robotic platforms. T o address this issue, we begin by conducting an extensive empirical study to explore the effectiveness of model compression techniques when applied to VLAs. Building on the insights gained from these preliminary experiments, we propose RLRC, a three-stage recovery method for compressed VLAs, including structured pruning, performance recovery based on SFT and RL, and further quantization. RLRC achieves up to an 8 reduction in memory usage and a 2.3 improvement in inference throughput, while maintaining or even surpassing the original VLA's task success rate. Extensive experiments show that RLRC consistently outperforms existing compression baselines, demonstrating strong potential for on-device deployment of VLAs. I. INTRODUCTION Recent advances in the field of robot learning have demonstrated new breakthroughs in both the accuracy and generalization of robotic policies for task execution. Since the introduction of RT -2 [1], Vision-Language-Action (VLA) models have attracted increasing attention. These models, built upon large foundation models, exhibit strong generalization capabilities, suggesting a promising path toward the development of general-purpose robots capable of performing a wide range of manipulation tasks. VLA models leverage the general knowledge embedded in pretrained Vision-Language Models (VLMs), while possessing the capability to comprehend language instructions, perceive the visual environment, and generate appropriate actions [2][3][4].
arXiv.org Artificial Intelligence
Jun-24-2025