3D Scene Geometry Estimation from 360$^\circ$ Imagery: A Survey

da Silveira, Thiago Lopes Trugillo, Pinto, Paulo Gamarra Lessa, Llerena, Jeffri Erwin Murrugarra, Jung, Claudio Rosito

arXiv.org Artificial Intelligence 

The world is three-dimensional (3D). As such, recovering 3D information about real-world objects allows the exploration of many relevant applications, including self-driving cars [1, 2], robot navigation [3, 4], virtual tourism [5, 6], infrastructure inspection [7, 8], archaeological [9, 10] and architectural modeling [5, 11], city planning [12, 13], and 3D cinema [14, 15]. Many sensors can be used to obtain 3D data from real objects, such as light detection and ranging [16], structured light [17], and time of flight [18]. There is a plethora of approaches for inferring 3D information from plain color images/videos. The widespread accessibility and low-cost of consumer cameras is a strong motivation for the continued research efforts devoted to image-based 3D scene reconstruction methods [19]. In theory, 3D information can only be inferred from two or more captures of the scene, as in typical multi-view stereo [20] or structure from motion [21] approaches. However, recent approaches are exploring machine learning to perform single-image depth inference [22, 23, 24]. Most methods developed so far rely on traditional perspective/pinhole-based cameras, which have a narrow field of view (FoV) and thus might require thousands of captures to model large scenes [25, 26].