C3Po: Cross-View Cross-Modality Correspondence by Pointmap Prediction

Jun-17-2026, 13:12:56 GMT–Neural Information Processing Systems

Geometric models like DUSt3R have shown great advances in understanding the geometry of a scene from pairs of photos. However, they fail when the inputs are from vastly different viewpoints (e.g., aerial vs. ground) or modalities (e.g., photos vs. abstract drawings) compared to what was observed during training. This paper addresses a challenging version of this problem: predicting correspondences between ground-level photos and floor plans. Current datasets for joint photo-floor plan reasoning are limited, either lacking in varying modalities (VIGOR) or lacking in correspondences (WAFFLE). To address these limitations, we introduce a new dataset, C3, created by first reconstructing a number of scenes in 3D from Internet photo collections via structure-from-motion, then manually registering the reconstructions to floor plans gathered from the Internet, from which we can derive correspondences between images and floor plans.

artificial intelligence, floor plan, machine learning, (17 more...)

Neural Information Processing Systems

Jun-17-2026, 13:12:56 GMT

Conferences PDF

Add feedback

Country:
- Europe (0.93)

Genre:
- Research Report > Experimental Study (1.00)

Industry:
- Banking & Finance (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (1.00)
  - Machine Learning > Neural Networks (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found