EveryDayVLA: A Vision-Language-Action Model for Affordable Robotic Manipulation

Chopra, Samarth, McMoil, Alex, Carnovale, Ben, Sokolson, Evan, Kubendran, Rajkumar, Dickerson, Samuel

arXiv.org Artificial Intelligence 

Abstract-- While Vision-Language-Action (VLA) models map visual inputs and language instructions directly to robot actions, they often rely on costly hardware and struggle in novel or cluttered scenes. We introduce EverydayVLA, a 6-DOF manipulator that can be assembled for $300, capable of modest payloads and workspaces. A single unified model jointly outputs discrete and continuous actions, and our adaptive-horizon ensembler monitors motion uncertainty to trigger on-the-fly replanning for safe, reliable operation. On LIBERO, Ev-erydayVLA matches state-of-the-art success rates, and in real-world tests it outperforms prior methods by 49% in-distribution and 34.9% out-of-distribution. By combining a state-of-the-art VLA with cost-effective hardware, EverydayVLA democratizes access to a robotic foundation model, and paves the way for economical use in homes and research labs alike.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found