Shake-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Manipulations and Liquid Mixing

Khan, Muhamamd Haris, Asfaw, Selamawit, Iarchuk, Dmitrii, Cabrera, Miguel Altamirano, Moreno, Luis, Tokmurziyev, Issatay, Tsetserukou, Dzmitry

Jan-12-2025–arXiv.org Artificial Intelligence

This paper introduces Shake-VLA, a Vision-Language-Action (VLA) model-based system designed to enable bimanual robotic manipulation for automated cocktail preparation. The system integrates a vision module for detecting ingredient bottles and reading labels, a speech-to-text module for interpreting user commands, and a language model to generate task-specific robotic instructions. Force Torque (FT) sensors are employed to precisely measure the quantity of liquid poured, ensuring accuracy in ingredient proportions during the mixing process. The system architecture includes a Retrieval-Augmented Generation (RAG) module for accessing and adapting recipes, an anomaly detection mechanism to address ingredient availability issues, and bimanual robotic arms for dexterous manipulation. Experimental evaluations demonstrated a high success rate across system components, with the speech-to-text module achieving a 93% success rate in noisy environments, the vision module attaining a 91% success rate in object and label detection in cluttered environment, the anomaly module successfully identified 95% of discrepancies between detected ingredients and recipe requirements, and the system achieved an overall success rate of 100% in preparing cocktails, from recipe formulation to action generation.

data mining, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

Jan-12-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Russia (0.18)

Genre:
- Research Report (1.00)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Anomaly Detection (0.56)
  - Artificial Intelligence
    - Robots (1.00)
    - Speech > Speech Recognition (0.56)
    - Natural Language
      - Large Language Model (0.69)
      - Chatbot (0.69)
    - Machine Learning > Neural Networks
      - Deep Learning (0.92)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found