Zero-Splat TeleAssist: A Zero-Shot Pose Estimation Framework for Semantic Teleoperation

Dec-10-2025–arXiv.org Artificial Intelligence

Abstract--We introduce Zero-Splat T eleAssist, a zero-shot sensor-fusion pipeline that transforms commodity CCTV streams into a shared, 6-DoF world model for multilateral teleopera-tion. By integrating vision-language segmentation, monocular depth, weighted-PCA pose extraction and 3-D Gaussian Splatting (3DGS), T eleAssist provides every operator with real-time global positions and orientations of multiple robots without fiducials or depth sensors in an interaction-centric teleoperation. Teleoperating robots in complex or remote environments is challenging due to limited on-board perception, occlusions, and operator cognitive load. Traditional teleoperation relies on the robot's sensors (cameras, LiDAR, IMU) which often experiences narrow fields of view, occlusions, cumulative drift, collectively increasing the cognitive load on human operators who must maintain situational awareness. Meanwhile, external camera infrastructures (e.g., CCTV) have potential to provide complementary visual coverage and global contextualization, but conventional solutions rely heavily on visual fiducials, such as AprilTags or ArUco markers [5], or motion-capture systems requiring controlled lighting and calibration processes.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

Dec-10-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.15)

Genre:
- Research Report (0.40)

Industry:
- Government (0.36)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Vision > Video Understanding (0.73)
  - Natural Language > Large Language Model (0.63)
  - Machine Learning > Neural Networks (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found