SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning
Wang, Hanzhen, Xu, Jiaming, Pan, Jiayi, Zhou, Yongkang, Dai, Guohao
–arXiv.org Artificial Intelligence
Pruning accelerates compute-bound models by reducing computation. Recently applied to Vision-Language-Action (VLA) models, existing methods prune tokens using only local info from current action, ignoring global context from prior actions, causing >20% success rate drop and limited speedup. We observe high similarity across consecutive actions and propose leveraging both local (current) and global (past) info for smarter token selection. We introduce SpecPrune-VLA, a training-free method with two-level pruning and heuristic control: (1) Static pruning at action level: uses global history and local context to reduce visual tokens per action; (2) Dynamic pruning at layer level: prunes tokens per layer based on layer-specific importance; (3) Lightweight action-aware controller: classifies actions as coarse/fine-grained (by speed), adjusting pruning aggressiveness since fine-grained actions are pruning-sensitive. Experiments on LIBERO show SpecPrune-VLA achieves 1.46 times speedup on NVIDIA A800 and 1.57 times on NVIDIA GeForce RTX 3090 vs. OpenVLA-OFT, with negligible success rate loss.
arXiv.org Artificial Intelligence
Sep-9-2025
- Genre:
- Research Report (0.82)
- Technology:
- Information Technology > Artificial Intelligence
- Robots (1.00)
- Machine Learning (1.00)
- Natural Language (0.71)
- Representation & Reasoning > Spatial Reasoning (0.68)
- Information Technology > Artificial Intelligence