AITopics | camera model

Collaborating Authors

camera model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data

Neural Information Processing SystemsFeb-15-2026, 22:15:09 GMT

Top-down Bird's Eye View (BEV) maps are a popular perceptual representation for ground robot navigation due to their richness and flexibility for downstream

artificial intelligence, dataset, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
(7 more...)

Genre: Research Report (0.93)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data

Neural Information Processing SystemsOct-10-2025, 06:24:38 GMT

Top-down Bird's Eye View (BEV) maps are a popular perceptual representation for ground robot navigation due to their richness and flexibility for downstream

dataset, map prediction, prediction, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
(7 more...)

Genre: Research Report (0.93)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Systematic Comparison of Projection Methods for Monocular 3D Human Pose Estimation on Fisheye Images

Käs, Stephanie, Peter, Sven, Thillmann, Henrik, Burenko, Anton, Adrian, David Benjamin, Mack, Dennis, Linder, Timm, Leibe, Bastian

arXiv.org Artificial IntelligenceJun-25-2025

Fisheye cameras offer robots the ability to capture human movements across a wider field of view (FOV) than standard pinhole cameras, making them particularly useful for applications in human-robot interaction and automotive contexts. However, accurately detecting human poses in fisheye images is challenging due to the curved distortions inherent to fisheye optics. While various methods for undistorting fisheye images have been proposed, their effectiveness and limitations for poses that cover a wide FOV has not been systematically evaluated in the context of absolute human pose estimation from monocular fisheye images. To address this gap, we evaluate the impact of pinhole, equidistant and double sphere camera models, as well as cylindrical projection methods, on 3D human pose estimation accuracy. We find that in close-up scenarios, pinhole projection is inadequate, and the optimal projection method varies with the FOV covered by the human pose. The usage of advanced fisheye models like the double sphere model significantly enhances 3D human pose estimation accuracy. We propose a heuristic for selecting the appropriate projection model based on the detection bounding box to enhance prediction quality. Additionally, we introduce and evaluate on our novel dataset FISHnCHIPS, which features 3D human skeleton annotations in fisheye images, including images from unconventional angles, such as extreme close-ups, ground-mounted cameras, and wide-FOV poses, available at: https://www.vision.rwth-aachen.de/fishnchips

artificial intelligence, pose estimation, video understanding, (16 more...)

arXiv.org Artificial Intelligence

2506.19747

Country: Europe > Germany (0.04)

Genre: Research Report (0.82)

Industry: Transportation (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (1.00)

Add feedback

Role of Uncertainty in Model Development and Control Design for a Manufacturing Process

Li, Rongfei, Assadian, Francis

arXiv.org Artificial IntelligenceJun-17-2025

The use of robotic technology has drastically increased in manufacturing in the 21st century. But by utilizing their sensory cues, humans still outperform machines, especially in the micro scale manufacturing, which requires high-precision robot manipulators. These sensory cues naturally compensate for high level of uncertainties that exist in the manufacturing environment. Uncertainties in performing manufacturing tasks may come from measurement noise, model inaccuracy, joint compliance (e.g., elasticity) etc. Although advanced metrology sensors and high-precision microprocessors, which are utilized in nowadays robots, have compensated for many structural and dynamic errors in robot positioning, but a well-designed control algorithm still works as a comparable and cheaper alternative to reduce uncertainties in automated manufacturing. Our work illustrates that a multi-robot control system can reduce various uncertainties to a great amount.

artificial intelligence, controller, transfer function, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.5772/intechopen.104780

2506.12273

Country:

Europe (1.00)
North America > United States (0.67)

Genre: Research Report (0.81)

Industry: Aerospace & Defense (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Robots in the Workplace (0.66)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.46)

Add feedback

AlignDiff: Learning Physically-Grounded Camera Alignment via Diffusion

Xie, Liuyue, Guo, Jiancong, Cakmakci, Ozan, Araujo, Andre, Jeni, Laszlo A., Jia, Zhiheng

arXiv.org Artificial IntelligenceMar-27-2025

Accurate camera calibration is a fundamental task for 3D perception, especially when dealing with real-world, in-the-wild environments where complex optical distortions are common. Existing methods often rely on pre-rectified images or calibration patterns, which limits their applicability and flexibility. In this work, we introduce a novel framework that addresses these challenges by jointly modeling camera intrinsic and extrinsic parameters using a generic ray camera model. Unlike previous approaches, AlignDiff shifts focus from semantic to geometric features, enabling more accurate modeling of local distortions. We propose AlignDiff, a diffusion model conditioned on geometric priors, enabling the simultaneous estimation of camera distortions and scene geometry. To enhance distortion prediction, we incorporate edge-aware attention, focusing the model on geometric features around image edges, rather than semantic content. Furthermore, to enhance generalizability to real-world captures, we incorporate a large database of ray-traced lenses containing over three thousand samples. This database characterizes the distortion inherent in a diverse variety of lens forms. Our experiments demonstrate that the proposed method significantly reduces the angular error of estimated ray bundles by ~8.2 degrees and overall calibration accuracy, outperforming existing approaches on challenging, real-world datasets.

aberration, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2503.21581

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Industry: Media > Photography (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Tacchi 2.0: A Low Computational Cost and Comprehensive Dynamic Contact Simulator for Vision-based Tactile Sensors

Sun, Yuhao, Zhang, Shixin, Li, Wenzhuang, Zhao, Jie, Shan, Jianhua, Shen, Zirong, Chen, Zixi, Sun, Fuchun, Guo, Di, Fang, Bin

arXiv.org Artificial IntelligenceMar-12-2025

With the development of robotics technology, some tactile sensors, such as vision-based sensors, have been applied to contact-rich robotics tasks. However, the durability of vision-based tactile sensors significantly increases the cost of tactile information acquisition. Utilizing simulation to generate tactile data has emerged as a reliable approach to address this issue. While data-driven methods for tactile data generation lack robustness, finite element methods (FEM) based approaches require significant computational costs. To address these issues, we integrated a pinhole camera model into the low computational cost vision-based tactile simulator Tacchi that used the Material Point Method (MPM) as the simulated method, completing the simulation of marker motion images. We upgraded Tacchi and introduced Tacchi 2.0. This simulator can simulate tactile images, marked motion images, and joint images under different motion states like pressing, slipping, and rotating. Experimental results demonstrate the reliability of our method and its robustness across various vision-based tactile sensors.

sensor, simulation, tactile sensor, (13 more...)

arXiv.org Artificial Intelligence

2503.091

Country:

Asia > China > Beijing > Beijing (0.06)
Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

AKiRa: Augmentation Kit on Rays for optical video generation

Wang, Xi, Courant, Robin, Christie, Marc, Kalogeiton, Vicky

arXiv.org Artificial IntelligenceDec-29-2024

Recent advances in text-conditioned video diffusion have greatly improved video quality. However, these methods offer limited or sometimes no control to users on camera aspects, including dynamic camera motion, zoom, distorted lens and focus shifts. These motion and optical aspects are crucial for adding controllability and cinematic elements to generation frameworks, ultimately resulting in visual content that draws focus, enhances mood, and guides emotions according to filmmakers' controls. In this paper, we aim to close the gap between controllable video generation and camera optics. To achieve this, we propose AKiRa (Augmentation Kit on Rays), a novel augmentation framework that builds and trains a camera adapter with a complex camera model over an existing video generation backbone. It enables fine-tuned control over camera motion as well as complex optical parameters (focal length, distortion, aperture) to achieve cinematic effects such as zoom, fisheye effect, and bokeh. Extensive experiments demonstrate AKiRa's effectiveness in combining and composing camera optics while outperforming all state-of-the-art methods. This work sets a new landmark in controlled and optically enhanced video generation, paving the way for future optical video generation methods.

akira, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.14158

Genre: Research Report (1.00)

Industry:

Media > Photography (1.00)
Media > Film (1.00)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors

Murai, Riku, Dexheimer, Eric, Davison, Andrew J.

arXiv.org Artificial IntelligenceDec-16-2024

We present a real-time monocular dense SLAM system designed bottom-up from MASt3R, a two-view 3D reconstruction and matching prior. Equipped with this strong prior, our system is robust on in-the-wild video sequences despite making no assumption on a fixed or parametric camera model beyond a unique camera centre. We introduce efficient methods for pointmap matching, camera tracking and local fusion, graph construction and loop closure, and second-order global optimisation. With known calibration, a simple modification to the system achieves state-of-the-art performance across various benchmarks. Altogether, we propose a plug-and-play monocular SLAM system capable of producing globally-consistent poses and dense geometry while operating at 15 FPS.

artificial intelligence, machine learning, real time system, (19 more...)

arXiv.org Artificial Intelligence

2412.12392

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Architecture > Real Time Systems (0.86)
Information Technology > Artificial Intelligence > Robots (0.68)

Add feedback

Bio-inspired reconfigurable stereo vision for robotics using omnidirectional cameras

Chen, Suchang, Fan, Dongliang, Feng, Huijuan, Dai, Jian S

arXiv.org Artificial IntelligenceOct-11-2024

This work introduces a novel bio-inspired reconfigurable stereo vision system for robotics, leveraging omnidirectional cameras and a novel algorithm to achieve flexible visual capabilities. Inspired by the adaptive vision of various species, our visual system addresses traditional stereo vision limitations, i.e., immutable camera alignment with narrow fields of view, by introducing a reconfigurable stereo vision system to robotics. Our key innovations include the reconfigurable stereo vision strategy that allows dynamic camera alignment, a robust depth measurement system utilizing a nonrectified geometrical method combined with a deep neural network for feature matching, and a geometrical compensation technique to enhance visual accuracy. Implemented on a metamorphic robot, this vision system demonstrates its great adaptability to various scenarios by switching its configurations of 316{\deg} monocular with 79{\deg} binocular field for fast target seeking and 242{\deg} monocular with 150{\deg} binocular field for detailed close inspection.

artificial intelligence, machine learning, vision system, (20 more...)

arXiv.org Artificial Intelligence

2410.08691

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > Thailand (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

P2U-SLAM: A Monocular Wide-FoV SLAM System Based on Point Uncertainty and Pose Uncertainty

Zhang, Yufan, Yang, Kailun, Wang, Ze, Wang, Kaiwei

arXiv.org Artificial IntelligenceSep-16-2024

This paper presents P2U-SLAM, a visual Simultaneous Localization And Mapping (SLAM) system with a wide Field of View (FoV) camera, which utilizes pose uncertainty and point uncertainty. While the wide FoV enables considerable repetitive observations of historical map points for matching cross-view features, the data properties of the historical map points and the poses of historical keyframes have changed during the optimization process. The neglect of data property changes triggers the absence of a partial information matrix in optimization and leads to the risk of long-term positioning performance degradation. The purpose of our research is to reduce the risk of the wide field of view visual input to the SLAM system. Based on the conditional probability model, this work reveals the definite impact of the above data properties changes on the optimization process, concretizes it as point uncertainty and pose uncertainty, and gives a specific mathematical form. P2U-SLAM respectively embeds point uncertainty and pose uncertainty into the tracking module and local mapping, and updates these uncertainties after each optimization operation including local mapping, map merging, and loop closing. We present an exhaustive evaluation in 27 sequences from two popular public datasets with wide-FoV visual input. P2U-SLAM shows excellent performance compared with other state-of-the-art methods. The source code will be made publicly available at https://github.com/BambValley/P2U-SLAM.

p2u-slam, pose uncertainty, sequence, (16 more...)

arXiv.org Artificial Intelligence

2409.10143

Country: Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback