AITopics | pose estimation network

Collaborating Authors

pose estimation network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

No Pose Estimation? No Problem: Pose-Agnostic and Instance-Aware Test-Time Adaptation for Monocular Depth Estimation

Sung, Mingyu, Choe, Hyeonmin, Kim, Il-Min, Yun, Sangseok, Kang, Jae Mo

arXiv.org Artificial IntelligenceNov-10-2025

Monocular depth estimation (MDE), inferring pixel-level depths in single RGB images from a monocular camera, plays a crucial and pivotal role in a variety of AI applications demanding a three-dimensional (3D) topographical scene. In the real-world scenarios, MDE models often need to be deployed in environments with different conditions from those for training. Test-time (domain) adaptation (TTA) is one of the compelling and practical approaches to address the issue. Although there have been notable advancements in TTA for MDE, particularly in a self-supervised manner, existing methods are still ineffective and problematic when applied to diverse and dynamic environments. To break through this challenge, we propose a novel and high-performing TTA framework for MDE, named PITTA. Our approach incorporates two key innovative strategies: (i) pose-agnostic TTA paradigm for MDE and (ii) instance-aware image masking. Specifically, PITTA enables highly effective TTA on a pretrained MDE network in a pose-agnostic manner without resorting to any camera pose information. Besides, our instance-aware masking strategy extracts instance-wise masks for dynamic objects (e.g., vehicles, pedestrians, etc.) from a segmentation mask produced by a pretrained panoptic segmentation network, by removing static objects including background components. To further boost performance, we also present a simple yet effective edge extraction methodology for the input image (i.e., a single monocular image) and depth map. Extensive experimental evaluations on DrivingStereo and Waymo datasets with varying environmental conditions demonstrate that our proposed framework, PITTA, surpasses the existing state-of-the-art techniques with remarkable performance improvements in MDE during TTA.

artificial intelligence, machine learning, mde network, (12 more...)

arXiv.org Artificial Intelligence

2511.05055

Country: Asia > South Korea (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Semi-SD: Semi-Supervised Metric Depth Estimation via Surrounding Cameras for Autonomous Driving

Xie, Yusen, Huang, Zhengmin, Shen, Shaojie, Ma, Jun

arXiv.org Artificial IntelligenceMar-25-2025

In this paper, we introduce Semi-SD, a novel metric depth estimation framework tailored for surrounding cameras equipment in autonomous driving. In this work, the input data consists of adjacent surrounding frames and camera parameters. We propose a unified spatial-temporal-semantic fusion module to construct the visual fused features. Cross-attention components for surrounding cameras and adjacent frames are utilized to focus on metric scale information refinement and temporal feature matching. Building on this, we propose a pose estimation framework using surrounding cameras, their corresponding estimated depths, and extrinsic parameters, which effectively address the scale ambiguity in multi-camera setups. Moreover, semantic world model and monocular depth estimation world model are integrated to supervised the depth estimation, which improve the quality of depth estimation. We evaluate our algorithm on DDAD and nuScenes datasets, and the results demonstrate that our method achieves state-of-the-art performance in terms of surrounding camera based depth estimation quality. The source code will be available on https://github.com/xieyuser/Semi-SD.

artificial intelligence, image understanding, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.19713

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Transportation > Ground > Road (0.62)
Information Technology > Robotics & Automation (0.62)
Automobiles & Trucks (0.62)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Increasing the Task Flexibility of Heavy-Duty Manipulators Using Visual 6D Pose Estimation of Objects

Mäkinen, Petri, Mustalahti, Pauli, Kivelä, Tuomo, Mattila, Jouni

arXiv.org Artificial IntelligenceFeb-26-2025

Recent advances in visual 6D pose estimation of objects using deep neural networks have enabled novel ways of vision-based control for heavy-duty robotic applications. In this study, we present a pipeline for the precise tool positioning of heavy-duty, long-reach (HDLR) manipulators using advanced machine vision. A camera is utilized in the so-called eye-in-hand configuration to estimate directly the poses of a tool and a target object of interest (OOI). Based on the pose error between the tool and the target, along with motion-based calibration between the camera and the robot, precise tool positioning can be reliably achieved using conventional robotic modeling and control methods prevalent in the industry. The proposed methodology comprises orientation and position alignment based on the visually estimated OOI poses, whereas camera-to-robot calibration is conducted based on motion utilizing visual SLAM. The methods seek to avert the inaccuracies resulting from rigid-body--based kinematics of structurally flexible HDLR manipulators via image-based algorithms. To train deep neural networks for OOI pose estimation, only synthetic data are utilized. The methods are validated in a real-world setting using an HDLR manipulator with a 5 m reach. The experimental results demonstrate that an image-based average tool positioning error of less than 2 mm along the non-depth axes is achieved, which facilitates a new way to increase the task flexibility and automation level of non-rigid HDLR manipulators.

hdlr manipulator, manipulator, pose estimation, (16 more...)

arXiv.org Artificial Intelligence

2502.19169

Country:

Europe > Finland > Pirkanmaa > Tampere (0.04)
North America > United States > New York (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Towards real-time 6D pose estimation of objects in single-view cone-beam X-ray

Viviers, Christiaan G. A., de Bruijn, Joel, Filatova, Lena, de With, Peter H. N., van der Sommen, Fons

arXiv.org Artificial IntelligenceNov-6-2022

Deep learning-based pose estimation algorithms can successfully estimate the pose of objects in an image, especially in the field of color images. 6D Object pose estimation based on deep learning models for X-ray images often use custom architectures that employ extensive CAD models and simulated data for training purposes. Recent RGB-based methods opt to solve pose estimation problems using small datasets, making them more attractive for the X-ray domain where medical data is scarcely available. We refine an existing RGB-based model (SingleShotPose) to estimate the 6D pose of a marked cube from grayscale X-ray images by creating a generic solution trained on only real X-ray data and adjusted for X-ray acquisition geometry. The model regresses 2D control points and calculates the pose through 2D/3D correspondences using Perspective-n-Point(PnP), allowing a single trained model to be used across all supporting cone-beam-based X-ray geometries. Since modern X-ray systems continuously adjust acquisition parameters during a procedure, it is essential for such a pose estimation network to consider these parameters in order to be deployed successfully and find a real use case. With a 5-cm/5-degree accuracy of 93% and an average 3D rotation error of 2.2 degrees, the results of the proposed approach are comparable with state-of-the-art alternatives, while requiring significantly less real training examples and being applicable in real-time applications.

artificial intelligence, estimation, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2211.03211

Country: Europe > Netherlands > North Brabant > Eindhoven (0.05)

Genre: Research Report (0.50)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for Robotic Bin Picking

Chen, Kai, Cao, Rui, James, Stephen, Li, Yichuan, Liu, Yun-Hui, Abbeel, Pieter, Dou, Qi

arXiv.org Artificial IntelligenceJul-21-2022

However, it is burdensome to annotate a customized dataset associated with each specific bin-picking scenario for training pose estimation models. In this paper, we propose an iterative self-training framework for sim-to-real 6D object pose estimation to facilitate cost-effective robotic grasping. Given a bin-picking scenario, we establish a photo-realistic simulator to synthesize abundant virtual data, and use this to train an initial pose estimation network. This network then takes the role of a teacher model, which generates pose predictions for unlabeled real data. With these predictions, we further design a comprehensive adaptive selection scheme to distinguish reliable results, and leverage them as pseudo labels to update a student model for pose estimation on real data. To continuously improve the quality of pseudo labels, we iterate the above steps by taking the trained student model as a new teacher and re-label real data using the refined teacher model. We evaluate our method on a public benchmark and our newly-released dataset, achieving an ADD(-S) improvement of 11.49% and 22.62% respectively. Our method is also able to improve robotic bin-picking success by 19.54%, demonstrating the potential of iterative sim-to-real solutions for robotic applications.

estimation, pose estimation, real data, (14 more...)

arXiv.org Artificial Intelligence

2204.07049

Country:

Asia > China > Hong Kong (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (1.00)

Industry: Education > Educational Technology (0.56)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Adversarial Attacks on Monocular Pose Estimation

Chawla, Hemang, Varma, Arnav, Arani, Elahe, Zonooz, Bahram

arXiv.org Artificial IntelligenceJul-14-2022

Advances in deep learning have resulted in steady progress in computer vision with improved accuracy on tasks such as object detection and semantic segmentation. Nevertheless, deep neural networks are vulnerable to adversarial attacks, thus presenting a challenge in reliable deployment. Two of the prominent tasks in 3D scene-understanding for robotics and advanced drive assistance systems are monocular depth and pose estimation, often learned together in an unsupervised manner. While studies evaluating the impact of adversarial attacks on monocular depth estimation exist, a systematic demonstration and analysis of adversarial perturbations against pose estimation are lacking. We show how additive imperceptible perturbations can not only change predictions to increase the trajectory drift but also catastrophically alter its geometry. We also study the relation between adversarial perturbations targeting monocular depth and pose estimation networks, as well as the transferability of perturbations to other networks with different architectures and losses. Our experiments show how the generated perturbations lead to notable errors in relative rotation and translation predictions and elucidate vulnerabilities of the networks.

adversarial attack, artificial intelligence, machine learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IROS47612.2022.9982154

2207.07032

Country: Europe > Netherlands (0.04)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (0.85)
Government > Military (0.85)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sim2Real Instance-Level Style Transfer for 6D Pose Estimation

Ikeda, Takuya, Tanishige, Suomi, Amma, Ayako, Sudano, Michael, Audren, Hervé, Nishiwaki, Koichi

arXiv.org Artificial IntelligenceMar-3-2022

In recent years, synthetic data has been widely used in the training of 6D pose estimation networks, in part because it automatically provides perfect annotation at low cost. However, there are still non-trivial domain gaps, such as differences in textures/materials, between synthetic and real data. These gaps have a measurable impact on performance. To solve this problem, we introduce a simulation to reality (sim2real) instance-level style transfer for 6D pose estimation network training. Our approach transfers the style of target objects individually, from synthetic to real, without human intervention. This improves the quality of synthetic data for training pose estimation networks. We also propose a complete pipeline from data collection to the training of a pose estimation network and conduct extensive evaluation on a real-world robotic platform. Our evaluation shows significant improvement achieved by our method in both pose estimation performance and the realism of images adapted by the style transfer.

artificial intelligence, machine learning, synthetic image, (17 more...)

arXiv.org Artificial Intelligence

2203.02069

Country:

Europe > Finland (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Bangla sign language recognition using concatenated BdSL network

Abedin, Thasin, Prottoy, Khondokar S. S., Moshruba, Ayana, Hakim, Safayat Bin

arXiv.org Artificial IntelligenceJul-25-2021

Sign language is the only medium of communication for the hearing impaired and the deaf and dumb community. Communication with the general mass is thus always a challenge for this minority group. Especially in Bangla sign language (BdSL), there are 38 alphabets with some having nearly identical symbols. As a result, in BdSL recognition, the posture of hand is an important factor in addition to visual features extracted from traditional Convolutional Neural Network (CNN). In this paper, a novel architecture "Concatenated BdSL Network" is proposed which consists of a CNN based image network and a pose estimation network. While the image network gets the visual features, the relative positions of hand keypoints are taken by the pose estimation network to obtain the additional features to deal with the complexity of the BdSL symbols. A score of 91.51% was achieved by this novel approach in test set and the effectiveness of the additional pose estimation network is suggested by the experimental results.

language recognition, pose estimation network, recognition, (11 more...)

arXiv.org Artificial Intelligence

2107.11818

Country: Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.05)

Genre: Research Report > Promising Solution (0.48)

Industry: Education > Curriculum > Subject-Specific Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.30)

Add feedback

Geometric Change Detection in Digital Twins using 3D Machine Learning

Sundby, Tiril, Graham, Julia Maria, Rasheed, Adil, Tabib, Mandar, San, Omer

arXiv.org Artificial IntelligenceMar-15-2021

Digital twins are meant to bridge the gap between real-world physical systems and virtual representations. Both stand-alone and descriptive digital twins incorporate 3D geometric models, which are the physical representations of objects in the digital replica. Digital twin applications are required to rapidly update internal parameters with the evolution of their physical counterpart. Due to an essential need for having high-quality geometric models for accurate physical representations, the storage and bandwidth requirements for storing 3D model information can quickly exceed the available storage and bandwidth capacity. In this work, we demonstrate a novel approach to geometric change detection in the context of a digital twin. We address the issue through a combined solution of Dynamic Mode Decomposition (DMD) for motion detection, YOLOv5 for object detection, and 3D machine learning for pose estimation. DMD is applied for background subtraction, enabling detection of moving foreground objects in real-time. The video frames containing detected motion are extracted and used as input to the change detection network. The object detection algorithm YOLOv5 is applied to extract the bounding boxes of detected objects in the video frames. Furthermore, the rotational pose of each object is estimated in a 3D pose estimation network. A series of convolutional neural networks conducts feature extraction from images and 3D model shapes. Then, the network outputs the estimated Euler angles of the camera orientation with respect to the object in the input image. By only storing data associated with a detected change in pose, we minimize necessary storage and bandwidth requirements while still being able to recreate the 3D scene on demand.

algorithm, detection, estimation, (17 more...)

arXiv.org Artificial Intelligence

2103.08201

Country:

Europe > Norway > Central Norway > Trøndelag > Trondheim (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Oklahoma > Payne County > Stillwater (0.04)
(2 more...)

Genre:

Research Report (0.84)
Overview (0.66)

Industry: Information Technology (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback