AITopics | camera coordinate system

Collaborating Authors

camera coordinate system

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

e3a0db7c0a191854c176af1d20cdec80-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsJun-23-2026, 00:03:15 GMT

The descriptions of each task are as follows:799 Single-view tasks Single-view tasks test a model's ability to infer spatial properties from a single800 image. These tasks include:801 Depth estimation (OC, OO, NA): Predicting absolute or relative depth values for objects802 Distance prediction (OC, OO, NA): Estimating the Euclidean distance between objects or803 from an object to the camera.804 Object center distance inference (OO, MCA): Given objects A, B and C, determine which805 of B and C is farther or closer to A.806 Object spatial relation (OO, MCA): Determining relative positioning (e.g., left, right, in807 Spatial imagination (OC, OO, MCA): Predicting unseen spatial relationships based on809 limited visual information.810 Multi-view tasks Multi-view tasks require reasoning across multiple images to infer spatial rela-811 tionships. These tasks include:812 Viewpoint change inference (NA): Given two perspectives, output how the camera should813 be moved to see the second perspective.814 Multi-view distance prediction (OC, OO, NA): Estimating object distances across different816 views.817 Multi-view object matching (MCA): Identifying the same object across multiple views.818

artificial intelligence, image understanding, spatial reasoning, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.48)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.34)

Add feedback

Appendices and Supplementary Material

Neural Information Processing SystemsJun-19-2026, 19:29:05 GMT

A.1 Coordinate Systems and Transformation To achieve spatial synchronization between different sensors, vehicle-vehicle-UAV collaboration requires using sensor parameter information to perform coordinate system transformations. The relationships between the coordinate systems are illustrated in Fig. S 1. Figure 1: Relationship between coordinate systems. The pixel coordinate system refers to a two-dimensional coordinate system defined on the image plane, typically represented as (u,v), with units in pixels. In this system, the origin is located at the top-left corner of the image, the u-axis points to the right along the horizontal direction, and the v-axis points downward along the vertical direction. This coordinate system is used to describe the position of points on the two-dimensional image captured by the camera.

artificial intelligence, coordinate system, platform, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (0.75)
Information Technology > Artificial Intelligence > Robots (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.50)

Add feedback

Appendix

Neural Information Processing SystemsOct-9-2025, 19:34:47 GMT

Our dataset is released under the CC BY -NC 4.0 license, allowing We include more dataset visualizations depicting various ego-actions in Figure 9 . Interaction plays a crucial role in driving scenarios. The sixth scenario showcases encountering road construction ahead, followed by encountering a street sweeper in the seventh scenario. Typically belonging to the tail end of a long-tail distribution, these scenarios are rare yet crucial for ensuring safe driving. Lane Changing: Changing lanes to overtake slower vehicles or merge into traffic.

dataset, drivingdojo dataset, please provide, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.04)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (0.69)
Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Vision (0.47)

Add feedback

3D Hand-Eye Calibration for Collaborative Robot Arm: Look at Robot Base Once

Li, Leihui, Wan, Lixuepiao, Krueger, Volker, Zhang, Xuping

arXiv.org Artificial IntelligenceMay-12-2025

Hand-eye calibration is a common problem in the field of collaborative robotics, involving the determination of the transformation matrix between the visual sensor and the robot flange to enable vision-based robotic tasks. However, this process typically requires multiple movements of the robot arm and an external calibration object, making it both time-consuming and inconvenient, especially in scenarios where frequent recalibration is necessary. In this work, we extend our previous method which eliminates the need for external calibration objects such as a chessboard. We propose a generic dataset generation approach for point cloud registration, focusing on aligning the robot base point cloud with the scanned data. Furthermore, a more detailed simulation study is conducted involving several different collaborative robot arms, followed by real-world experiments in an industrial setting. Our improved method is simulated and evaluated using a total of 14 robotic arms from 9 different brands, including KUKA, Universal Robots, UFACTORY, and Franka Emika, all of which are widely used in the field of collaborative robotics. Physical experiments demonstrate that our extended approach achieves performance comparable to existing commercial hand-eye calibration solutions, while completing the entire calibration procedure in just a few seconds. In addition, we provide a user-friendly hand-eye calibration solution, with the code publicly available at github.com/leihui6/LRBO.

artificial intelligence, calibration, hand-eye calibration, (13 more...)

arXiv.org Artificial Intelligence

2504.21619

Country: Europe (0.68)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)

Add feedback

Validation & Exploration of Multimodal Deep-Learning Camera-Lidar Calibration models

Karramreddy, Venkat, Mitchell, Liam

arXiv.org Artificial IntelligenceSep-20-2024

This article presents an innovative study in exploring, evaluating, and implementing deep learning architectures for the calibration of multi-modal sensor systems. The focus behind this is to leverage the use of sensor fusion to achieve dynamic, real-time alignment between 3D LiDAR and 2D Camera sensors. static calibration methods are tedious and time-consuming, which is why we propose utilizing Conventional Neural Networks (CNN) coupled with geometrically informed learning to solve this issue. We leverage the foundational principles of Extrinsic LiDAR-Camera Calibration tools such as RegNet, CalibNet, and LCCNet by exploring open-source models that are available online and comparing our results with their corresponding research papers. Requirements for extracting these visual and measurable outputs involved tweaking source code, fine-tuning, training, validation, and testing for each of these frameworks for equal comparisons. This approach aims to investigate which of these advanced networks produces the most accurate and consistent predictions. Through a series of experiments, we reveal some of their shortcomings and areas for potential improvements along the way. We find that LCCNet yields the best results out of all the models that we validated.

calibration, coordinate system, point cloud, (15 more...)

arXiv.org Artificial Intelligence

2409.13402

Country:

North America > United States > Michigan > Ingham County > Lansing (0.04)
North America > United States > Michigan > Ingham County > East Lansing (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bi-Mapper: Holistic BEV Semantic Mapping for Autonomous Driving

Li, Siyu, Yang, Kailun, Shi, Hao, Zhang, Jiaming, Lin, Jiacheng, Teng, Zhifeng, Li, Zhiyong

arXiv.org Artificial IntelligenceSep-6-2023

--A semantic map of the road scene, covering fundamental road elements, is an essential ingredient in autonomous driving systems. It provides important perception foundations for positioning and planning when rendered in the Bird's-Eye-View (BEV). Currently, the prior knowledge of hypothetical depth can guide the learning of translating front perspective views into BEV directly with the help of calibration parameters. However, it suffers from geometric distortions in the representation of distant objects. In addition, another stream of methods without prior knowledge can learn the transformation between front perspective views and BEV implicitly with a global view. Considering that the fusion of different learning methods may bring surprising beneficial effects, we propose a Bi-Mapper framework for top-down road-scene semantic understanding, which incorporates a global view and local prior knowledge. T o enhance reliable interaction between them, an asynchronous mutual learning strategy is proposed. At the same time, an Across-Space Loss (ASL) is designed to mitigate the negative impact of geometric distortions. Extensive results on nuScenes and Cam2BEV datasets verify the consistent effectiveness of each module in the proposed Bi-Mapper framework. Compared with exiting road mapping networks, the proposed Bi-Mapper achieves 2 . Moreover, we verify the generalization performance of Bi-Mapper in a real-world driving scenario. The source code is publicly available at BiMapper. N autonomous driving systems, a semantic map is an important basic element, which affects the downstream working, including location and planning. Recently, the Bird' s-Eye-View (BEV) map has shown an outstanding performance [1].

camera coordinate system, coordinate system, learning, (15 more...)

arXiv.org Artificial Intelligence

2305.04205

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (0.92)
Information Technology > Robotics & Automation (0.81)
Automobiles & Trucks (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.81)

Add feedback

3D Bounding Box Estimation Using Deep Learning and Geometry

#artificialintelligenceSep-22-2020, 07:25:16 GMT

The problem of 3D object detection is of particular importance in robotic applications that require decision making or interactions with objects in the real world. While recently developed 2D detection algorithms are capable of handling large variations in viewpoint and clutter, accurate 3D object detection largely remains an open problem despite some promising recent work. They first regress relatively stable 3D object properties using a deep convolutional neural network and then combines these estimates with geometric constraints provided by a 2D object bounding box to produce a complete 3D bounding box. Given estimated orientation and dimensions and the constraint that the projection of the 3D bounding box fits tightly into the 2D detection window, they recover the translation and the object's 3D bounding box. In order to study this article mathematically, we need a coordinate system.

artificial intelligence, deep learning, machine learning, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Efficiency in Real-time Webcam Gaze Tracking

Gudi, Amogh, Li, Xin, van Gemert, Jan

arXiv.org Artificial IntelligenceSep-2-2020

Efficiency and ease of use are essential for practical applications of camera based eye/gaze-tracking. Gaze tracking involves estimating where a person is looking on a screen based on face images from a computer-facing camera. In this paper we investigate two complementary forms of efficiency in gaze tracking: 1. The computational efficiency of the system which is dominated by the inference speed of a CNN predicting gaze-vectors; 2. The usability efficiency which is determined by the tediousness of the mandatory calibration of the gaze-vector to a computer screen. To do so, we evaluate the computational speed/accuracy trade-off for the CNN and the calibration effort/accuracy trade-off for screen calibration. For the CNN, we evaluate the full face, two-eyes, and single eye input. For screen calibration, we measure the number of calibration points needed and evaluate three types of calibration: 1. pure geometry, 2. pure machine learning, and 3. hybrid geometric regression. Results suggest that a single eye input and geometric regression calibration achieve the best trade-off.

artificial intelligence, calibration, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2009.0127

Country:

Europe > Netherlands > South Holland > Delft (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.67)

Add feedback