AITopics | equirectangular image

Collaborating Authors

equirectangular image

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Helvipad: A Real-World Dataset for Omnidirectional Stereo Depth Estimation

Zayene, Mehdi, Endres, Jannik, Havolli, Albias, Corbière, Charles, Cherkaoui, Salim, Kontouli, Alexandre, Alahi, Alexandre

arXiv.org Artificial IntelligenceNov-27-2024

Despite considerable progress in stereo depth estimation, omnidirectional imaging remains underexplored, mainly due to the lack of appropriate data. We introduce Helvipad, a real-world dataset for omnidirectional stereo depth estimation, consisting of 40K frames from video sequences across diverse environments, including crowded indoor and outdoor scenes with diverse lighting conditions. Collected using two 360{\deg} cameras in a top-bottom setup and a LiDAR sensor, the dataset includes accurate depth and disparity labels by projecting 3D point clouds onto equirectangular images. Additionally, we provide an augmented training set with a significantly increased label density by using depth completion. We benchmark leading stereo depth estimation models for both standard and omnidirectional images. The results show that while recent stereo methods perform decently, a significant challenge persists in accurately estimating depth in omnidirectional imaging. To address this, we introduce necessary adaptations to stereo models, achieving improved performance.

dataset, disparity, sequence, (13 more...)

arXiv.org Artificial Intelligence

2411.18335

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Reviews: Learning Spherical Convolution for Fast Features from 360 Imagery

Neural Information Processing SystemsOct-7-2024, 13:32:08 GMT

This paper describes a method to transform networks learned on perspective images to take spherical images as input. This is an important problem as fisheye and 360-degree sensors become more and more ubiquitous but training data is relatively scarce. The method first transforms the network architecture to adapt the filter sizes and pooling operations to convolutions on a equirectangular representation/projection. Next the filters are learned to match the feature responses of the original network when considering the projections to the tangent plane of the respective feature response. The filters are pre-learned layer-by-layer and fine-tuned to output features as similar as possible to the original network projected to the tangent planes. Detection experiments on Pano2Vid and PASCAL demonstrate that the technique performs slightly below the optimal performance using per-pixel tangent projections (however significantly faster) while outperforming several baselines, including cube map projections.

learning spherical convolution, projection, tangent plane, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.52)

Add feedback

Geometry Fidelity for Spherical Images

Christensen, Anders, Mojab, Nooshin, Patel, Khushman, Ahuja, Karan, Akata, Zeynep, Winther, Ole, Gonzalez-Franco, Mar, Colaco, Andrea

arXiv.org Artificial IntelligenceJul-25-2024

Spherical or omni-directional images offer an immersive visual format appealing to a wide range of computer vision applications. However, geometric properties of spherical images pose a major challenge for models and metrics designed for ordinary 2D images. Here, we show that direct application of Fr\'echet Inception Distance (FID) is insufficient for quantifying geometric fidelity in spherical images. We introduce two quantitative metrics accounting for geometric constraints, namely Omnidirectional FID (OmniFID) and Discontinuity Score (DS). OmniFID is an extension of FID tailored to additionally capture field-of-view requirements of the spherical format by leveraging cubemap projections. DS is a kernel-based seam alignment score of continuity across borders of 2D representations of spherical images. In experiments, OmniFID and DS quantify geometry fidelity issues that are undetected by FID.

projection, representation, spherical image, (15 more...)

arXiv.org Artificial Intelligence

2407.18207

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.95)
(3 more...)

Add feedback

FindView: Precise Target View Localization Task for Look Around Agents

Ishikawa, Haruya, Aoki, Yoshimitsu

arXiv.org Artificial IntelligenceMar-15-2023

The field of research aims to create agents that use visual sensors for solving complex tasks or aid humans by learning to perceive, communicate, and act in their environment. Humans in the loop make the goal very difficult since the dynamics of the environment are changeable, and human interactions can lead to unexpected events. Towards better collaboration between agents and humans, agents must be able to perform localization of any point in space that reflects the characteristics of human's perception of 3D space Cirik et al. [2020]. Since the visual sensors for the agents are commonly RGB sensors employed with partial Field-of-View (FoV), we would need to train these agents to perceive how humans see from these views. Communication with these agents will almost always necessitate the agents to navigate to view a common referential FoV in the scene so that the human can instruct the agents with the shared contexts. Challenge arises since the point of interest could be any point in the scene, and many points in the scene will not correspond to easily named objects. So far, many embodied agents being researched use either partial FoVs or directly use panoramic images that are hard for human observers to understand. We believe that embodied agents should be able to look around and localize in various views that human observers might be looking at. We approach this problem by introducing a new task, namely the FindView task, to evaluate and benchmark the agents (Figure 1).

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2303.09054

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Communications (0.93)
(2 more...)

Add feedback

DIGITOUR: Automatic Digital Tours for Real-Estate Properties

Chhikara, Prateek, Kuhar, Harshul, Goyal, Anil, Sharma, Chirag

arXiv.org Artificial IntelligenceJan-16-2023

A virtual or digital tour is a form of virtual reality technology which allows a user to experience a specific location remotely. Currently, these virtual tours are created by following a 2-step strategy. First, a photographer clicks a 360 degree equirectangular image; then, a team of annotators manually links these images for the "walkthrough" user experience. The major challenge in the mass adoption of virtual tours is the time and cost involved in manual annotation/linking of images. Therefore, this paper presents an end-to-end pipeline to automate the generation of 3D virtual tours using equirectangular images for real-estate properties. We propose a novel HSV-based coloring scheme for paper tags that need to be placed at different locations before clicking the equirectangular images using 360 degree cameras. These tags have two characteristics: i) they are numbered to help the photographer for placement of tags in sequence and; ii) bi-colored, which allows better learning of tag detection (using YOLOv5 architecture) in an image and digit recognition (using custom MobileNet architecture) tasks. Finally, we link/connect all the equirectangular images based on detected tags. We show the efficiency of the proposed pipeline on a real-world equirectangular image dataset collected from the Housing.com database.

artificial intelligence, equirectangular image, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3570991.3571060

2301.0668

Country:

Asia > India > Maharashtra > Mumbai (0.06)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.64)

Industry: Banking & Finance > Real Estate (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.68)

Add feedback

Attention-Enhanced Cross-modal Localization Between 360 Images and Point Clouds

Zhao, Zhipeng, Yu, Huai, Lyv, Chenwei, Yang, Wen, Scherer, Sebastian

arXiv.org Artificial IntelligenceDec-6-2022

Visual localization plays an important role for intelligent robots and autonomous driving, especially when the accuracy of GNSS is unreliable. Recently, camera localization in LiDAR maps has attracted more and more attention for its low cost and potential robustness to illumination and weather changes. However, the commonly used pinhole camera has a narrow Field-of-View, thus leading to limited information compared with the omni-directional LiDAR data. To overcome this limitation, we focus on correlating the information of 360 equirectangular images to point clouds, proposing an end-to-end learnable network to conduct cross-modal visual localization by establishing similarity in high-dimensional feature space. Inspired by the attention mechanism, we optimize the network to capture the salient feature for comparing images and point clouds. We construct several sequences containing 360 equirectangular images and corresponding point clouds based on the KITTI-360 dataset and conduct extensive experiments. The results demonstrate the effectiveness of our approach.

artificial intelligence, machine learning, point cloud, (18 more...)

arXiv.org Artificial Intelligence

2212.02757

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Services (0.35)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

PanoFlow: Learning 360{\deg} Optical Flow for Surrounding Temporal Understanding

Shi, Hao, Zhou, Yifan, Yang, Kailun, Yin, Xiaoting, Wang, Ze, Ye, Yaozu, Yin, Zhe, Meng, Shi, Li, Peng, Wang, Kaiwei

arXiv.org Artificial IntelligenceNov-29-2022

Optical flow estimation is a basic task in self-driving and robotics systems, which enables to temporally interpret traffic scenes. Autonomous vehicles clearly benefit from the ultra-wide Field of View (FoV) offered by 360{\deg} panoramic sensors. However, due to the unique imaging process of panoramic cameras, models designed for pinhole images do not directly generalize satisfactorily to 360{\deg} panoramic images. In this paper, we put forward a novel network framework--PanoFlow, to learn optical flow for panoramic images. To overcome the distortions introduced by equirectangular projection in panoramic transformation, we design a Flow Distortion Augmentation (FDA) method, which contains radial flow distortion (FDA-R) or equirectangular flow distortion (FDA-E). We further look into the definition and properties of cyclic optical flow for panoramic videos, and hereby propose a Cyclic Flow Estimation (CFE) method by leveraging the cyclicity of spherical images to infer 360{\deg} optical flow and converting large displacement to relatively small displacement. PanoFlow is applicable to any existing flow estimation method and benefits from the progress of narrow-FoV flow estimation. In addition, we create and release a synthetic panoramic dataset FlowScape based on CARLA to facilitate training and quantitative analysis. PanoFlow achieves state-of-the-art performance on the public OmniFlowNet and the established FlowScape benchmarks. Our proposed approach reduces the End-Point-Error (EPE) on FlowScape by 27.3%. On OmniFlowNet, PanoFlow achieves a 55.5% error reduction from the best published result. We also qualitatively validate our method via a collection vehicle and a public real-world OmniPhotos dataset, indicating strong potential and robustness for real-world navigation applications. Code and dataset are publicly available at https://github.com/MasterHow/PanoFlow.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2202.13388

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.86)

Add feedback