AITopics | rgb-d camera

Collaborating Authors

rgb-d camera

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras

Neural Information Processing SystemsDec-24-2025, 10:33:41 GMT

We introduce DROID-SLAM, a new deep learning based SLAM system. DROID-SLAM consists of recurrent iterative updates of camera pose and pixelwise depth through a Dense Bundle Adjustment layer. DROID-SLAM is accurate, achieving large improvements over prior work, and robust, suffering from substantially fewer catastrophic failures. Despite training on monocular video, it can leverage stereo or RGB-D video to achieve improved performance at test time.

deep visual slam, droid-slam, name change, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.63)

Add feedback

AFT: Appearance-Based Feature Tracking for Markerless and Training-Free Shape Reconstruction of Soft Robots

Yuan, Shangyuan, Fairchild, Preston, Mei, Yu, Zhou, Xinyu, Tan, Xiaobo

arXiv.org Artificial IntelligenceNov-25-2025

Accurate shape reconstruction is essential for precise control and reliable operation of soft robots. Compared to sensor-based approaches, vision-based methods offer advantages in cost, simplicity, and ease of deployment. However, existing vision-based methods often rely on complex camera setups, specific backgrounds, or large-scale training datasets, limiting their practicality in real-world scenarios. In this work, we propose a vision-based, markerless, and training-free framework for soft robot shape reconstruction that directly leverages the robot's natural surface appearance. These surface features act as implicit visual markers, enabling a hierarchical matching strategy that decouples local partition alignment from global kinematic optimization. Requiring only an initial 3D reconstruction and kinematic alignment, our method achieves real-time shape tracking across diverse environments while maintaining robustness to occlusions and variations in camera viewpoints. Experimental validation on a continuum soft robot demonstrates an average tip error of 2.6% during real-time operation, as well as stable performance in practical closed-loop control tasks. These results highlight the potential of the proposed approach for reliable, low-cost deployment in dynamic real-world settings.

artificial intelligence, machine learning, robot, (15 more...)

arXiv.org Artificial Intelligence

2511.18215

Country: North America > United States > Michigan (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

3D Mapping Using a Lightweight and Low-Power Monocular Camera Embedded inside a Gripper of Limbed Climbing Robots

Okawara, Taku, Nishibe, Ryo, Kasano, Mao, Uno, Kentaro, Yoshida, Kazuya

arXiv.org Artificial IntelligenceNov-11-2025

Limbed climbing robots are designed to explore challenging vertical walls, such as the skylights of the Moon and Mars. In such robots, the primary role of a hand-eye camera is to accurately estimate 3D positions of graspable points (i.e., convex terrain surfaces) thanks to its close-up views. While conventional climbing robots often employ RGB-D cameras as hand-eye cameras to facilitate straightforward 3D terrain mapping and graspable point detection, RGB-D cameras are large and consume considerable power. This work presents a 3D terrain mapping system designed for space exploration using limbed climbing robots equipped with a monocular hand-eye camera. Compared to RGB-D cameras, monocular cameras are more lightweight, compact structures, and have lower power consumption. Although monocular SLAM can be used to construct 3D maps, it suffers from scale ambiguity. To address this limitation, we propose a SLAM method that fuses monocular visual constraints with limb forward kinematics. The proposed method jointly estimates time-series gripper poses and the global metric scale of the 3D map based on factor graph optimization. We validate the proposed framework through both physics-based simulations and real-world experiments. The results demonstrate that our framework constructs a metrically scaled 3D terrain map in real-time and enables autonomous grasping of convex terrain surfaces using a monocular hand-eye camera, without relying on RGB-D cameras. Our method contributes to scalable and energy-efficient perception for future space missions involving limbed climbing robots. See the video summary here: https://youtu.be/fMBrrVNKJfc

artificial intelligence, robot, terrain surface, (18 more...)

arXiv.org Artificial Intelligence

2511.05816

Country: Asia > Japan > Honshū (0.28)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)

Add feedback

RoboRetriever: Single-Camera Robot Object Retrieval via Active and Interactive Perception with Dynamic Scene Graph

Wang, Hecheng, Ren, Jiankun, Yu, Jia, Qi, Lizhe, Sun, Yunquan

arXiv.org Artificial IntelligenceAug-19-2025

Humans effortlessly retrieve objects in cluttered, partially observable environments by combining visual reasoning, active viewpoint adjustment, and physical interaction-with only a single pair of eyes. In contrast, most existing robotic systems rely on carefully positioned fixed or multi-camera setups with complete scene visibility, which limits adaptability and incurs high hardware costs. We present \textbf{RoboRetriever}, a novel framework for real-world object retrieval that operates using only a \textbf{single} wrist-mounted RGB-D camera and free-form natural language instructions. RoboRetriever grounds visual observations to build and update a \textbf{dynamic hierarchical scene graph} that encodes object semantics, geometry, and inter-object relations over time. The supervisor module reasons over this memory and task instruction to infer the target object and coordinate an integrated action module combining \textbf{active perception}, \textbf{interactive perception}, and \textbf{manipulation}. To enable task-aware scene-grounded active perception, we introduce a novel visual prompting scheme that leverages large reasoning vision-language models to determine 6-DoF camera poses aligned with the semantic task goal and geometry scene context. We evaluate RoboRetriever on diverse real-world object retrieval tasks, including scenarios with human intervention, demonstrating strong adaptability and robustness in cluttered scenes with only one RGB-D camera.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2508.12916

Country:

Europe (0.69)
North America > United States (0.68)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

From Monocular Vision to Autonomous Action: Guiding Tumor Resection via 3D Reconstruction

Acar, Ayberk, Smith, Mariana, Al-Zogbi, Lidia, Watts, Tanner, Li, Fangjie, Li, Hao, Yilmaz, Nural, Scheikl, Paul Maria, d'Almeida, Jesse F., Sharma, Susheela, Branscombe, Lauren, Ertop, Tayfun Efe, Webster, Robert J. III, Oguz, Ipek, Kuntz, Alan, Krieger, Axel, Wu, Jie Ying

arXiv.org Artificial IntelligenceMar-20-2025

Surgical automation requires precise guidance and understanding of the scene. Current methods in the literature rely on bulky depth cameras to create maps of the anatomy, however this does not translate well to space-limited clinical applications. Monocular cameras are small and allow minimally invasive surgeries in tight spaces but additional processing is required to generate 3D scene understanding. We propose a 3D mapping pipeline that uses only RGB images to create segmented point clouds of the target anatomy. To ensure the most precise reconstruction, we compare different structure from motion algorithms' performance on mapping the central airway obstructions, and test the pipeline on a downstream task of tumor resection. In several metrics, including post-procedure tissue model evaluation, our pipeline performs comparably to RGB-D cameras and, in some cases, even surpasses their performance. These promising results demonstrate that automation guidance can be achieved in minimally invasive procedures with monocular cameras. This study is a step toward the complete autonomy of surgical robots.

artificial intelligence, machine learning, pipeline, (15 more...)

arXiv.org Artificial Intelligence

2503.16263

Country:

North America > United States > Tennessee > Knox County > Knoxville (0.14)
North America > United States > Tennessee > Davidson County > Nashville (0.05)
South America > Uruguay > Maldonado > Maldonado (0.04)
(8 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.94)
Health & Medicine > Therapeutic Area > Oncology (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.96)

Add feedback

A-SEE2.0: Active-Sensing End-Effector for Robotic Ultrasound Systems with Dense Contact Surface Perception Enabled Probe Orientation Adjustment

Zhetpissov, Yernar, Ma, Xihan, Yang, Kehan, Zhang, Haichong K.

arXiv.org Artificial IntelligenceMar-7-2025

Conventional freehand ultrasound (US) imaging is highly dependent on the skill of the operator, often leading to inconsistent results and increased physical demand on sonographers. Robotic Ultrasound Systems (RUSS) aim to address these limitations by providing standardized and automated imaging solutions, especially in environments with limited access to skilled operators. This paper presents the development of a novel RUSS system that employs dual RGB-D depth cameras to maintain the US probe normal to the skin surface, a critical factor for optimal image quality. Our RUSS integrates RGB-D camera data with robotic control algorithms to maintain orthogonal probe alignment on uneven surfaces without preoperative data. Validation tests using a phantom model demonstrate that the system achieves robust normal positioning accuracy while delivering ultrasound images comparable to those obtained through manual scanning. A-SEE2.0 demonstrates 2.47 ${\pm}$ 1.25 degrees error for flat surface normal-positioning and 12.19 ${\pm}$ 5.81 degrees normal estimation error on mannequin surface. This work highlights the potential of A-SEE2.0 to be used in clinical practice by testing its performance during in-vivo forearm ultrasound examinations.

a-see2, point cloud, probe, (13 more...)

arXiv.org Artificial Intelligence

2503.05569

Country:

North America > United States > Massachusetts > Worcester County > Worcester (0.04)
Europe > Germany (0.04)
Europe > Czechia > Prague (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Sixth-Sense: Self-Supervised Learning of Spatial Awareness of Humans from a Planar Lidar

Arreghini, Simone, Carlotti, Nicholas, Nava, Mirko, Paolillo, Antonio, Giusti, Alessandro

arXiv.org Artificial IntelligenceFeb-28-2025

Localizing humans is a key prerequisite for any service robot operating in proximity to people. In these scenarios, robots rely on a multitude of state-of-the-art detectors usually designed to operate with RGB-D cameras or expensive 3D LiDARs. However, most commercially available service robots are equipped with cameras with a narrow field of view, making them blind when a user is approaching from other directions, or inexpensive 1D LiDARs whose readings are difficult to interpret. To address these limitations, we propose a self-supervised approach to detect humans and estimate their 2D pose from 1D LiDAR data, using detections from an RGB-D camera as a supervision source. Our approach aims to provide service robots with spatial awareness of nearby humans. After training on 70 minutes of data autonomously collected in two environments, our model is capable of detecting humans omnidirectionally from 1D LiDAR data in a novel environment, with 71% precision and 80% recall, while retaining an average absolute error of 13 cm in distance and 44{\deg} in orientation.

detection, robot, sensor, (14 more...)

arXiv.org Artificial Intelligence

2502.21029

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots > Robots in the Home (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.30)

Add feedback

Visual-Haptic Model Mediated Teleoperation for Remote Ultrasound

Black, David, Tirindelli, Maria, Salcudean, Septimiu, Wein, Wolfgang, Esposito, Marco

arXiv.org Artificial IntelligenceFeb-11-2025

Tele-ultrasound has the potential greatly to improve health equity for countless remote communities. However, practical scenarios involve potentially large time delays which cause current implementations of telerobotic ultrasound (US) to fail. Using a local model of the remote environment to provide haptics to the expert operator can decrease teleoperation instability, but the delayed visual feedback remains problematic. This paper introduces a robotic tele-US system in which the local model is not only haptic, but also visual, by re-slicing and rendering a pre-acquired US sweep in real time to provide the operator a preview of what the delayed image will resemble. A prototype system is presented and tested with 15 volunteer operators. It is found that visual-haptic model-mediated teleoperation (MMT) compensates completely for time delays up to 1000 ms round trip in terms of operator effort and completion time while conventional MMT does not. Visual-haptic MMT also significantly outperforms MMT for longer time delays in terms of motion accuracy and force control. This proof-of-concept study suggests that visual-haptic MMT may facilitate remote robotic tele-US.

artificial intelligence, teleoperation, time delay, (16 more...)

arXiv.org Artificial Intelligence

2502.07922

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > South Carolina > York County > Rock Hill (0.04)
North America > United States > California > Santa Clara County > Santa Clara (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report > Experimental Study (0.49)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras

Neural Information Processing SystemsJan-14-2025, 15:58:30 GMT

deep visual slam, droid-slam, rgb-d camera, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

Multilayer occupancy grid for obstacle avoidance in an autonomous ground vehicle using RGB-D camera

Gallego, Jhair S., Ramirez, Ricardo E.

arXiv.org Artificial IntelligenceNov-19-2024

This work describes the process of integrating a depth camera into the navigation system of a self-driving ground vehicle (SDV) and the implementation of a multilayer costmap that enhances the vehicle's obstacle identification process by expanding its two-dimensional field of view, based on 2D LIDAR, to a three-dimensional perception system using an RGB-D camera. This approach lays the foundation for a robust vision-based navigation and obstacle detection system. A theoretical review is presented and implementation results are discussed for future work.

artificial intelligence, planning & scheduling, vehicle, (17 more...)

arXiv.org Artificial Intelligence

2411.12535

Country:

South America > Colombia > Bogotá D.C. > Bogotá (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.51)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.35)

Add feedback