PanoGen: Text-Conditioned Panoramic Environment Generation for Vision-and-Language Navigation

Dec-24-2025, 22:33:34 GMT–Neural Information Processing Systems

Vision-and-Language Navigation requires the agent to follow language instructions to navigate through 3D environments. One main challenge in Vision-and-Language Navigation is the limited availability of photorealistic training environments, which makes it hard to generalize to new and unseen environments. To address this problem, we propose PanoGen, a generation method that can potentially create an infinite number of diverse panoramic environments conditioned on text. Specifically, we collect room descriptions by captioning the room images in existing Matterport3D environments, and leverage a state-of-the-art text-to-image diffusion model to generate the new panoramic environments. We use recursive outpainting over the generated images to create consistent 360-degree panorama views.

artificial intelligence, machine learning, panoramic environment, (9 more...)

Neural Information Processing Systems

Dec-24-2025, 22:33:34 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)