Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting

Neural Information Processing Systems 

In detail, we utilize foundation visual perception models to obtain each generated object from the image and leverage pre-trained depth estimation models to lift the generated 2D image to 3D space.