\textit{Bifr\"ost} : 3D-Aware Image Compositing with Language Instructions

Neural Information Processing Systems 

This paper introduces $\textit{Bifröst}$, a novel 3D-aware framework that is built upon diffusion models to perform instruction-based image composition. Previous methods concentrate on image compositing at the 2D level, which fall short in handling complex spatial relationships ($\textit{e.g.}$, occlusion).