Large Spatial Model: End-to-end Unposed Images to Semantic 3D

Neural Information Processing Systems 

Reconstructing and understanding 3D structures from a limited number of images is a classical problem in computer vision.