Real-Time Execution of Action Chunking Flow Policies
–Neural Information Processing Systems
Modern AI systems, especially those interacting with the physical world, increasingly require real-time performance. However, the high latency of state-of-the-art generalist models, including recent vision-language-action models (VLAs), poses a significant challenge. While action chunking has enabled temporal consistency in high-frequency control tasks, it does not fully address the latency problem, leading to pauses or out-of-distribution jerky movements at chunk boundaries. This paper presents a novel inference-time algorithm that enables smooth asynchronous execution of action chunking policies. Our method, real-time chunking (RTC), is applicable to any diffusion-or flow-based VLA out of the box with no re-training. It generates the next action chunk while executing the current one, "freezing" actions guaranteed to execute and "inpainting" the rest. To test RTC, we introduce a new benchmark of 12 highly dynamic tasks in the Kinetix simulator, as well as evaluate 6 challenging real-world bimanual manipulation tasks. Results demonstrate that RTC is fast, performant, and uniquely robust to inference delay, significantly improving task throughput and enabling high success rates in precise tasks--such as lighting a match--even in the presence of significant latency.
Neural Information Processing Systems
Jun-16-2026, 01:35:14 GMT
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.66)
- Research Report
- Industry:
- Energy (0.68)
- Information Technology (0.46)
- Automobiles & Trucks (0.46)
- Transportation > Ground
- Road (0.46)
- Technology:
- Information Technology
- Architecture > Real Time Systems (1.00)
- Artificial Intelligence
- Vision (1.00)
- Robots (1.00)
- Representation & Reasoning (1.00)
- Natural Language (1.00)
- Machine Learning
- Reinforcement Learning (0.68)
- Neural Networks > Deep Learning (0.46)
- Information Technology