AITopics | depth data

Collaborating Authors

depth data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Real-Time Obstacle Avoidance for a Mobile Robot Using CNN-Based Sensor Fusion

Zain, Lamiaa H.

arXiv.org Artificial IntelligenceDec-1-2025

Obstacle avoidance is a critical component of the navigation stack required for mobile robots to operate effectively in complex and unknown environments. In this research, three end-to-end Convolutional Neural Networks (CNNs) were trained and evaluated offline and deployed on a differential-drive mobile robot for real-time obstacle avoidance to generate low-level steering commands from synchronized color and depth images acquired by an Intel RealSense D415 RGB-D camera in diverse environments. Offline evaluation showed that the NetConEmb model achieved the best performance with a notably low MedAE of $0.58 \times 10^{-3}$ rad/s. In comparison, the lighter NetEmb architecture, which reduces the number of trainable parameters by approximately 25\% and converges faster, produced comparable results with an RMSE of $21.68 \times 10^{-3}$ rad/s, close to the $21.42 \times 10^{-3}$ rad/s obtained by NetConEmb. Real-time navigation further confirmed NetConEmb's robustness, achieving a 100\% success rate in both known and unknown environments, while NetEmb and NetGated succeeded only in navigating the known environment.

artificial intelligence, machine learning, navigation, (19 more...)

arXiv.org Artificial Intelligence

2509.08095

Country: Africa > Middle East > Egypt (0.29)

Genre: Research Report (1.00)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)

Add feedback

Imitation Learning for Obstacle Avoidance Using End-to-End CNN-Based Sensor Fusion

Zain, Lamiaa H., Ammar, Hossam H., Shalaby, Raafat E.

arXiv.org Artificial IntelligenceJul-14-2025

Obstacle avoidance is crucial for mobile robots' navigation in both known and unknown environments. This research designs, trains, and tests two custom Convolutional Neural Networks (CNNs), using color and depth images from a depth camera as inputs. Both networks adopt sensor fusion to produce an output: the mobile robot's angular velocity, which serves as the robot's steering command. A newly obtained visual dataset for navigation was collected in diverse environments with varying lighting conditions and dynamic obstacles. During data collection, a communication link was established over Wi-Fi between a remote server and the robot, using Robot Operating System (ROS) topics. Velocity commands were transmitted from the server to the robot, enabling synchronized recording of visual data and the corresponding steering commands. Various evaluation metrics, such as Mean Squared Error, Variance Score, and Feed-Forward time, provided a clear comparison between the two networks and clarified which one to use for the application.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2507.08112

Country: Africa > Middle East > Egypt (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Telecommunications (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Category-Level and Open-Set Object Pose Estimation for Robotics

Hönig, Peter, Hirschmanner, Matthias, Vincze, Markus

arXiv.org Artificial IntelligenceApr-29-2025

Object pose estimation enables a variety of tasks in computer vision and robotics, including scene understanding and robotic grasping. The complexity of a pose estimation task depends on the unknown variables related to the target object. While instance-level methods already excel for opaque and Lambertian objects, category-level and open-set methods, where texture, shape, and size are partially or entirely unknown, still struggle with these basic material properties. Since texture is unknown in these scenarios, it cannot be used for disambiguating object symmetries, another core challenge of 6D object pose estimation. The complexity of estimating 6D poses with such a manifold of unknowns led to various datasets, accuracy metrics, and algorithmic solutions. This paper compares datasets, accuracy metrics, and algorithms for solving 6D pose estimation on the category-level. Based on this comparison, we analyze how to bridge category-level and open-set object pose estimation to reach generalization and provide actionable recommendations.

artificial intelligence, pose estimation, video understanding, (16 more...)

arXiv.org Artificial Intelligence

2504.19572

Country: Europe > Austria (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)

Add feedback

Multimodal Object Detection using Depth and Image Data for Manufacturing Parts

Mahjourian, Nazanin, Nguyen, Vinh

arXiv.org Artificial IntelligenceNov-13-2024

Manufacturing requires reliable object detection methods for precise picking and handling of diverse types of manufacturing parts and components. Traditional object detection methods utilize either only 2D images from cameras or 3D data from lidars or similar 3D sensors. However, each of these sensors have weaknesses and limitations. Cameras do not have depth perception and 3D sensors typically do not carry color information. These weaknesses can undermine the reliability and robustness of industrial manufacturing systems. To address these challenges, this work proposes a multi-sensor system combining an red-green-blue (RGB) camera and a 3D point cloud sensor. The two sensors are calibrated for precise alignment of the multimodal data captured from the two hardware devices. A novel multimodal object detection method is developed to process both RGB and depth data. This object detector is based on the Faster R-CNN baseline that was originally designed to process only camera images. The results show that the multimodal model significantly outperforms the depth-only and RGB-only baselines on established object detection metrics. More specifically, the multimodal model improves mAP by 13% and raises Mean Precision by 11.8% in comparison to the RGB-only baseline. Compared to the depth-only baseline, it improves mAP by 78% and raises Mean Precision by 57%. Hence, this method facilitates more reliable and robust object detection in service to smart manufacturing applications.

detection, sensor, variant, (16 more...)

arXiv.org Artificial Intelligence

2411.09062

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Michigan (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.68)

Add feedback

SelfReDepth: Self-Supervised Real-Time Depth Restoration for Consumer-Grade Sensors

Duarte, Alexandre, Fernandes, Francisco, Pereira, João M., Moreira, Catarina, Nascimento, Jacinto C., Jorge, Joaquim

arXiv.org Artificial IntelligenceJun-5-2024

Depth maps produced by consumer-grade sensors suffer from inaccurate measurements and missing data from either system or scene-specific sources. Data-driven denoising algorithms can mitigate such problems. However, they require vast amounts of ground truth depth data. Recent research has tackled this limitation using self-supervised learning techniques, but it requires multiple RGB-D sensors. Moreover, most existing approaches focus on denoising single isolated depth maps or specific subjects of interest, highlighting a need for methods to effectively denoise depth maps in real-time dynamic environments. This paper extends state-of-the-art approaches for depth-denoising commodity depth devices, proposing SelfReDepth, a self-supervised deep learning technique for depth restoration, via denoising and hole-filling by inpainting full-depth maps captured with RGB-D sensors. The algorithm targets depth data in video streams, utilizing multiple sequential depth frames coupled with color data to achieve high-quality depth videos with temporal coherence. Finally, SelfReDepth is designed to be compatible with various RGB-D sensors and usable in real-time scenarios as a pre-processing step before applying other depth-dependent algorithms. Our results demonstrate our approach's real-time performance on real-world datasets. They show that it outperforms state-of-the-art denoising and restoration performance at over 30fps on Commercial Depth Cameras, with potential benefits for augmented and mixed-reality applications.

artificial intelligence, machine learning, real time system, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s11554-024-01491-z

2406.03388

Country:

Europe > Portugal > Lisbon > Lisbon (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Promising Solution (0.66)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)

Add feedback

Universal Bovine Identification via Depth Data and Deep Metric Learning

Sharma, Asheesh, Randewich, Lucy, Andrew, William, Hannuna, Sion, Campbell, Neill, Mullan, Siobhan, Dowsey, Andrew W., Smith, Melvyn, Hansen, Mark, Burghardt, Tilo

arXiv.org Artificial IntelligenceMar-29-2024

This paper proposes and evaluates, for the first time, a top-down (dorsal view), depth-only deep learning system for accurately identifying individual cattle and provides associated code, datasets, and training weights for immediate reproducibility. An increase in herd size skews the cow-to-human ratio at the farm and makes the manual monitoring of individuals more challenging. Therefore, real-time cattle identification is essential for the farms and a crucial step towards precision livestock farming. Underpinned by our previous work, this paper introduces a deep-metric learning method for cattle identification using depth data from an off-the-shelf 3D camera. The method relies on CNN and MLP backbones that learn well-generalised embedding spaces from the body shape to differentiate individuals -- requiring neither species-specific coat patterns nor close-up muzzle prints for operation. The network embeddings are clustered using a simple algorithm such as $k$-NN for highly accurate identification, thus eliminating the need to retrain the network for enrolling new individuals. We evaluate two backbone architectures, ResNet, as previously used to identify Holstein Friesians using RGB images, and PointNet, which is specialised to operate on 3D point clouds. We also present CowDepth2023, a new dataset containing 21,490 synchronised colour-depth image pairs of 99 cows, to evaluate the backbones. Both ResNet and PointNet architectures, which consume depth maps and point clouds, respectively, led to high accuracy that is on par with the coat pattern-based backbone.

accuracy, dataset, identification, (13 more...)

arXiv.org Artificial Intelligence

2404.00172

Country:

Europe > United Kingdom > England (0.04)
North America (0.04)
Europe > United Kingdom > Northern Ireland (0.04)
Europe > Sweden (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine (1.00)
Food & Agriculture > Agriculture (1.00)
Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Efficient Gesture Recognition on Spiking Convolutional Networks Through Sensor Fusion of Event-Based and Depth Data

Steffen, Lea, Trapp, Thomas, Roennau, Arne, Dillmann, Rüdiger

arXiv.org Artificial IntelligenceJan-30-2024

As intelligent systems become increasingly important in our daily lives, new ways of interaction are needed. Classical user interfaces pose issues for the physically impaired and are partially not practical or convenient. Gesture recognition is an alternative, but often not reactive enough when conventional cameras are used. This work proposes a Spiking Convolutional Neural Network, processing event- and depth data for gesture recognition. The network is simulated using the open-source neuromorphic computing framework LAVA for offline training and evaluation on an embedded system. For the evaluation three open source data sets are used. Since these do not represent the applied bi-modality, a new data set with synchronized event- and depth data was recorded. The results show the viability of temporal encoding on depth information and modality fusion, even on differently encoded data, to be beneficial to network performance and generalization capabilities.

dataset, depth data, recognition, (13 more...)

arXiv.org Artificial Intelligence

2401.17064

Country: Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision > Gesture Recognition (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Converting Depth Images and Point Clouds for Feature-based Pose Estimation

Lösch, Robert, Sastuba, Mark, Toth, Jonas, Jung, Bernhard

arXiv.org Artificial IntelligenceOct-23-2023

In recent years, depth sensors have become more and more affordable and have found their way into a growing amount of robotic systems. However, mono- or multi-modal sensor registration, often a necessary step for further processing, faces many challenges on raw depth images or point clouds. This paper presents a method of converting depth data into images capable of visualizing spatial details that are basically hidden in traditional depth images. After noise removal, a neighborhood of points forms two normal vectors whose difference is encoded into this new conversion. Compared to Bearing Angle images, our method yields brighter, higher-contrast images with more visible contours and more details. We tested feature-based pose estimation of both conversions in a visual odometry task and RGB-D SLAM. For all tested features, AKAZE, ORB, SIFT, and SURF, our new Flexion images yield better results than Bearing Angle images and show great potential to bridge the gap between depth data and classical computer vision. Source code is available here: https://rlsch.github.io/depth-flexion-conversion.

ba image, depth image, flexion image, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IROS55552.2023.10341758

2310.14924

Country:

Europe > Germany (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Portugal (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Vision > Video Understanding (0.61)

Add feedback

Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation

Xiao, Hejun, Peng, Kunyu, Huang, Xiangsheng, Roitberg1, Alina, Li, Hao, Wang, Zhaohui, Stiefelhagen, Rainer

arXiv.org Artificial IntelligenceAug-23-2023

Fall detection is a vital task in health monitoring, as it allows the system to trigger an alert and therefore enabling faster interventions when a person experiences a fall. Although most previous approaches rely on standard RGB video data, such detailed appearance-aware monitoring poses significant privacy concerns. Depth sensors, on the other hand, are better at preserving privacy as they merely capture the distance of objects from the sensor or camera, omitting color and texture information. In this paper, we introduce a privacy-supporting solution that makes the RGB-trained model applicable in depth domain and utilizes depth data at test time for fall detection. To achieve cross-modal fall detection, we present an unsupervised RGB to Depth (RGB2Depth) cross-modal domain adaptation approach that leverages labelled RGB data and unlabelled depth data during training. Our proposed pipeline incorporates an intermediate domain module for feature bridging, modality adversarial loss for modality discrimination, classification loss for pseudo-labeled depth data and labeled source data, triplet loss that considers both source and target domains, and a novel adaptive loss weight adjustment method for improved coordination among various losses. Our approach achieves state-of-the-art results in the unsupervised RGB2Depth domain adaptation task for fall detection. Code is available at https://github.com/1015206533/privacy_supporting_fall_detection.

artificial intelligence, human computer interaction, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2308.12049

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
Asia > China > Beijing > Beijing (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Consumer Health (0.54)
Information Technology > Security & Privacy (0.48)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(4 more...)

Add feedback

Clothes Grasping and Unfolding Based on RGB-D Semantic Segmentation

Zhu, Xingyu, Wang, Xin, Freer, Jonathan, Chang, Hyung Jin, Gao, Yixing

arXiv.org Artificial IntelligenceMay-8-2023

Clothes grasping and unfolding is a core step in robotic-assisted dressing. Most existing works leverage depth images of clothes to train a deep learning-based model to recognize suitable grasping points. These methods often utilize physics engines to synthesize depth images to reduce the cost of real labeled data collection. However, the natural domain gap between synthetic and real images often leads to poor performance of these methods on real data. Furthermore, these approaches often struggle in scenarios where grasping points are occluded by the clothing item itself. To address the above challenges, we propose a novel Bi-directional Fractal Cross Fusion Network (BiFCNet) for semantic segmentation, enabling recognition of graspable regions in order to provide more possibilities for grasping. Instead of using depth images only, we also utilize RGB images with rich color features as input to our network in which the Fractal Cross Fusion (FCF) module fuses RGB and depth data by considering global complex features based on fractal geometry. To reduce the cost of real data collection, we further propose a data augmentation method based on an adversarial strategy, in which the color and geometric transformations simultaneously process RGB and depth data while maintaining the label correspondence. Finally, we present a pipeline for clothes grasping and unfolding from the perspective of semantic segmentation, through the addition of a strategy for grasp point selection from segmentation regions based on clothing flatness measures, while taking into account the grasping direction. We evaluate our BiFCNet on the public dataset NYUDv2 and obtained comparable performance to current state-of-the-art models. We also deploy our model on a Baxter robot, running extensive grasping and unfolding experiments as part of our ablation studies, achieving an 84% success rate.

artificial intelligence, machine learning, segmentation, (17 more...)

arXiv.org Artificial Intelligence

2305.03259

Country:

Asia > China (0.05)
Europe > United Kingdom > England > West Midlands > Birmingham (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback