AITopics | epipolar line

In this paper, we propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior. Unlike recent prior-free MVS methods that work in a pair-wise manner, our method simultaneously considers all the source images. Specifically, we introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information within and across multi-view images.

computer vision, depth range, source image, (12 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > China (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Era3D: High-Resolution Multiview Diffusion Using Efficient Row-wise Attention

Neural Information Processing SystemsOct-10-2025, 04:41:31 GMT

In this paper, we introduce Era3D, a novel multiview diffusion method that generates high-resolution multiview images from a single-view image.

arxiv preprint arxiv, input image, reconstruction, (14 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

45ed1a72597594c097152ef9cc187762-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 00:58:07 GMT

corr, dataset, efreesplat, (16 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(7 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)
Research Report > Promising Solution (0.67)

Industry:

Information Technology (0.67)
Health & Medicine (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

38e511a690709603d4cc3a1c52b4a9fd-Paper-Conference.pdf

Neural Information Processing SystemsAug-14-2025, 06:18:07 GMT

computer vision, proceedings, transformer, (11 more...)

Neural Information Processing Systems

Country: Asia > China > Guangdong Province > Shenzhen (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Epipolar Attention Field Transformers for Bird's Eye View Semantic Segmentation

Witte, Christian, Behley, Jens, Stachniss, Cyrill, Raaijmakers, Marvin

arXiv.org Artificial IntelligenceDec-2-2024

Spatial understanding of the semantics of the surroundings is a key capability needed by autonomous cars to enable safe driving decisions. Recently, purely vision-based solutions have gained increasing research interest. In particular, approaches extracting a bird's eye view (BEV) from multiple cameras have demonstrated great performance for spatial understanding. This paper addresses the dependency on learned positional encodings to correlate image and BEV feature map elements for transformer-based methods. We propose leveraging epipolar geometric constraints to model the relationship between cameras and the BEV by Epipolar Attention Fields. They are incorporated into the attention mechanism as a novel attribution term, serving as an alternative to learned positional encodings. Experiments show that our method EAFormer outperforms previous BEV approaches by 2% mIoU for map semantic segmentation and exhibits superior generalization capabilities compared to implicitly learning the camera configuration.

artificial intelligence, machine learning, segmentation, (19 more...)

arXiv.org Artificial Intelligence

2412.01595

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Asia > Singapore (0.04)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (0.35)
Information Technology > Robotics & Automation (0.35)
Automobiles & Trucks (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Exploiting Motion Prior for Accurate Pose Estimation of Dashboard Cameras

Lu, Yipeng, Zhao, Yifan, Wang, Haiping, Ruan, Zhiwei, Liu, Yuan, Dong, Zhen, Yang, Bisheng

arXiv.org Artificial IntelligenceSep-27-2024

Dashboard cameras (dashcams) record millions of driving videos daily, offering a valuable potential data source for various applications, including driving map production and updates. A necessary step for utilizing these dashcam data involves the estimation of camera poses. However, the low-quality images captured by dashcams, characterized by motion blurs and dynamic objects, pose challenges for existing image-matching methods in accurately estimating camera poses. In this study, we propose a precise pose estimation method for dashcam images, leveraging the inherent camera motion prior. Typically, image sequences captured by dash cameras exhibit pronounced motion prior, such as forward movement or lateral turns, which serve as essential cues for correspondence estimation. Building upon this observation, we devise a pose regression module aimed at learning camera motion prior, subsequently integrating these prior into both correspondences and pose estimation processes. The experiment shows that, in real dashcams dataset, our method is 22% better than the baseline for pose estimation in AUC5\textdegree, and it can estimate poses for 19% more images with less reprojection error in Structure from Motion (SfM).

correspondence, estimation, hypothesis, (14 more...)

arXiv.org Artificial Intelligence

2409.18673

Country:

Asia > China > Hubei Province > Wuhan (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.70)

Industry: Media (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SCENES: Subpixel Correspondence Estimation With Epipolar Supervision

Kloepfer, Dominik A., Henriques, João F., Campbell, Dylan

arXiv.org Artificial IntelligenceJan-19-2024

Extracting point correspondences from two or more views of a scene is a fundamental computer vision problem with particular importance for relative camera pose estimation and structure-from-motion. Existing local feature matching approaches, trained with correspondence supervision on large-scale datasets, obtain highly-accurate matches on the test sets. However, they do not generalise well to new datasets with different characteristics to those they were trained on, unlike classic feature extractors. Instead, they require finetuning, which assumes that ground-truth correspondences or ground-truth camera poses and 3D structure are available. We relax this assumption by removing the requirement of 3D structure, e.g., depth maps or point clouds, and only require camera pose information, which can be obtained from odometry. We do so by replacing correspondence losses with epipolar losses, which encourage putative matches to lie on the associated epipolar line. While weaker than correspondence supervision, we observe that this cue is sufficient for finetuning existing models on new data. We then further relax the assumption of known camera poses by using pose estimates in a novel bootstrapping approach. We evaluate on highly challenging datasets, including an indoor drone dataset and an outdoor smartphone camera dataset, and obtain state-of-the-art results without strong supervision.

dataset, epipolar line, supervision, (15 more...)

arXiv.org Artificial Intelligence

2401.10886

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California > San Francisco County > San Francisco (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Austria > Styria > Graz (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (0.93)
(2 more...)

Add feedback

Filters

Collaborating Authors

epipolar line

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

770b5223b47d4304042826b29733e864-Paper-Conference.pdf

65a723bf7d8dad838c09178270d30e80-Paper-Conference.pdf

45ed1a72597594c097152ef9cc187762-Paper-Conference.pdf

A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding

Era3D: High-Resolution Multiview Diffusion Using Efficient Row-wise Attention

45ed1a72597594c097152ef9cc187762-Paper-Conference.pdf

38e511a690709603d4cc3a1c52b4a9fd-Paper-Conference.pdf

Epipolar Attention Field Transformers for Bird's Eye View Semantic Segmentation

Exploiting Motion Prior for Accurate Pose Estimation of Dashboard Cameras

SCENES: Subpixel Correspondence Estimation With Epipolar Supervision