Casper: Inferring Diverse Intents for Assistive Teleoperation with Vision Language Models

Liu, Huihan, Shah, Rutav, Liu, Shuijing, Pittenger, Jack, Seo, Mingyo, Cui, Yuchen, Bisk, Yonatan, Martín-Martín, Roberto, Zhu, Yuke

Jul-8-2025–arXiv.org Artificial Intelligence

Deploying robots in human-centric settings like households requires balancing robot autonomy with humans' sense of agency [1, 2, 3, 4, 5, 6]. Full teleoperation offers users fine-grained control but imposes a high cognitive load, whereas fully autonomous robots act independently but often misalign their actions with nuanced human needs. Assistive teleoperation -- a paradigm in which both the human and the robot share control [7, 8, 9, 10] -- has thus emerged as an ideal middle ground. By keeping the user in control of high-level decisions while delegating low-level actions to the autonomous robot, this approach both preserves user agency and enhances overall system performance. As such, assistive teleoperation is becoming a desirable paradigm for robots to serve as reliable partners in human-centric environments, such as assisting individuals with motor impairments [11, 12]. While promising, assistive teleoperation in everyday environments remains challenging. A longstanding challenge in assistive teleoperation is to infer human intents from user control inputs and assist users with correct actions [8]. This challenge is amplified in real-world settings, where robots must go beyond closed-set intent prediction [13, 14] to handle diverse, open-ended user goals across different contexts and scenes. As a result, a key capability the robot should possess is to interpret user control inputs within the visual context and infer intent through commonsense reasoning.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

Jul-8-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > Los Angeles County
    - Los Angeles (0.14)
  - Texas > Travis County
    - Austin (0.04)
- Oceania > Australia
  - New South Wales > Sydney (0.04)

Genre:
- Research Report > Experimental Study (0.68)

Industry:
- Education (0.46)
- Government (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.69)
  - Robots > Robot Planning & Action (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found