Retrieval-guided Cross-view Image Synthesis

Nov-29-2024–arXiv.org Artificial Intelligence

Cross-view image synthesis involves generating new images of a scene from different viewpoints or perspectives, given one input image from other viewpoints. Despite recent advancements, there are several limitations in existing methods: 1) reliance on additional data such as semantic segmentation maps or preprocessing modules to bridge the domain gap; 2) insufficient focus on view-specific semantics, leading to compromised image quality and realism; and 3) a lack of diverse datasets representing complex urban environments. To tackle these challenges, we propose: 1) a novel retrieval-guided framework that employs a retrieval network as an embedder to address the domain gap; 2) an innovative generator that enhances semantic consistency and diversity specific to the target view to improve image quality and realism; and 3) a new dataset, VIGOR-GEN, providing diverse cross-view image pairs in urban settings to enrich dataset diversity. Extensive experiments on well-known CVUSA, CVACT, and new VIGOR-GEN datasets demonstrate that our method generates images of superior realism, significantly outperforming current leading approaches, particularly in SSIM and FID evaluations. Cross-view image synthesis aims to generate images from a new perspective or viewpoint that differs from the original image, which synthesizes images from a given view (e.g., aerial or bird's eye view) to a target view (e.g., street or ground view), even when the target viewpoint was not originally captured. It offers a wide range of applications, such as autonomous driving, robot navigation, 3D reconstruction Mahmud et al. (2020), virtual/augmented reality Bischke et al. (2016), urban planning In this paper, we probe into the ground-to-aerial / aerial-to-ground view synthesis based on a given source-view image (as illustrated in the upper half of Figure 1). This task presents significant challenges, as it requires the model to comprehend and interpret the scene's geometry and object appearances from one view, and then reconstruct or generate a realistic image from a different viewpoint. While promising, several key challenges plague existing cross-view image synthesis methods. Existing methods often rely on extra information like semantic segmentation maps Regmi & Borji (2018); Tang et al. (2019); Wu et al. (2022) or preprocessing modules like polar-transformation Lu et al. (2020); Toker et al. (2021); Shi et al. (2022) to bridge the domain gap between different views.

artificial intelligence, machine learning, synthesis, (12 more...)

arXiv.org Artificial Intelligence

Nov-29-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.68)

Genre:
- Research Report (1.00)

Industry:
- Information Technology > Security & Privacy (0.35)
- Transportation > Ground (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (1.00)
  - Vision (1.00)