Zero-Splat TeleAssist: A Zero-Shot Pose Estimation Framework for Semantic Teleoperation
Dokania, Srijan, Raghavan, Dharini
–arXiv.org Artificial Intelligence
Abstract--We introduce Zero-Splat T eleAssist, a zero-shot sensor-fusion pipeline that transforms commodity CCTV streams into a shared, 6-DoF world model for multilateral teleopera-tion. By integrating vision-language segmentation, monocular depth, weighted-PCA pose extraction and 3-D Gaussian Splatting (3DGS), T eleAssist provides every operator with real-time global positions and orientations of multiple robots without fiducials or depth sensors in an interaction-centric teleoperation. Teleoperating robots in complex or remote environments is challenging due to limited on-board perception, occlusions, and operator cognitive load. Traditional teleoperation relies on the robot's sensors (cameras, LiDAR, IMU) which often experiences narrow fields of view, occlusions, cumulative drift, collectively increasing the cognitive load on human operators who must maintain situational awareness. Meanwhile, external camera infrastructures (e.g., CCTV) have potential to provide complementary visual coverage and global contextualization, but conventional solutions rely heavily on visual fiducials, such as AprilTags or ArUco markers [5], or motion-capture systems requiring controlled lighting and calibration processes.
arXiv.org Artificial Intelligence
Dec-10-2025
- Country:
- North America > United States (0.15)
- Genre:
- Research Report (0.40)
- Industry:
- Government (0.36)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.48)
- Natural Language > Large Language Model (0.63)
- Robots (1.00)
- Vision > Video Understanding (0.73)
- Information Technology > Artificial Intelligence