FlexWorld: Progressively Expanding 3D Scenes for Flexible-View Exploration

Neural Information Processing Systems 

Generating flexible-view 3D scenes, including 360 rotation and zooming, from single images is challenging due to a lack of 3D data. To this end, we introduce FlexWorld, a novel framework that progressively constructs a persistent 3D Gaussian splatting representation by synthesizing and integrating new 3D content. To handle novel view synthesis under large camera variations, we leverage an advanced pre-trained video model fine-tuned on accurate depth-estimated training pairs.