Goto

Collaborating Authors

 retrieval network


Retrieval-guided Cross-view Image Synthesis

arXiv.org Artificial Intelligence

Cross-view image synthesis involves generating new images of a scene from different viewpoints or perspectives, given one input image from other viewpoints. Despite recent advancements, there are several limitations in existing methods: 1) reliance on additional data such as semantic segmentation maps or preprocessing modules to bridge the domain gap; 2) insufficient focus on view-specific semantics, leading to compromised image quality and realism; and 3) a lack of diverse datasets representing complex urban environments. To tackle these challenges, we propose: 1) a novel retrieval-guided framework that employs a retrieval network as an embedder to address the domain gap; 2) an innovative generator that enhances semantic consistency and diversity specific to the target view to improve image quality and realism; and 3) a new dataset, VIGOR-GEN, providing diverse cross-view image pairs in urban settings to enrich dataset diversity. Extensive experiments on well-known CVUSA, CVACT, and new VIGOR-GEN datasets demonstrate that our method generates images of superior realism, significantly outperforming current leading approaches, particularly in SSIM and FID evaluations. Cross-view image synthesis aims to generate images from a new perspective or viewpoint that differs from the original image, which synthesizes images from a given view (e.g., aerial or bird's eye view) to a target view (e.g., street or ground view), even when the target viewpoint was not originally captured. It offers a wide range of applications, such as autonomous driving, robot navigation, 3D reconstruction Mahmud et al. (2020), virtual/augmented reality Bischke et al. (2016), urban planning In this paper, we probe into the ground-to-aerial / aerial-to-ground view synthesis based on a given source-view image (as illustrated in the upper half of Figure 1). This task presents significant challenges, as it requires the model to comprehend and interpret the scene's geometry and object appearances from one view, and then reconstruct or generate a realistic image from a different viewpoint. While promising, several key challenges plague existing cross-view image synthesis methods. Existing methods often rely on extra information like semantic segmentation maps Regmi & Borji (2018); Tang et al. (2019); Wu et al. (2022) or preprocessing modules like polar-transformation Lu et al. (2020); Toker et al. (2021); Shi et al. (2022) to bridge the domain gap between different views.


Deep Reinforcement Learning with Multitask Episodic Memory Based on Task-Conditioned Hypernetwork

arXiv.org Artificial Intelligence

Deep reinforcement learning algorithms are usually impeded by sampling inefficiency, heavily depending on multiple interactions with the environment to acquire accurate decision-making capabilities. In contrast, humans rely on their hippocampus to retrieve relevant information from past experiences of relevant tasks, which guides their decision-making when learning a new task, rather than exclusively depending on environmental interactions. Nevertheless, designing a hippocampus-like module for an agent to incorporate past experiences into established reinforcement learning algorithms presents two challenges. The first challenge involves selecting the most relevant past experiences for the current task, and the second challenge is integrating such experiences into the decision network. To address these challenges, we propose a novel method that utilizes a retrieval network based on task-conditioned hypernetwork, which adapts the retrieval network's parameters depending on the task. At the same time, a dynamic modification mechanism enhances the collaborative efforts between the retrieval and decision networks. We evaluate the proposed method on the MiniGrid environment.The experimental results demonstrate that our proposed method significantly outperforms strong baselines.