scene reconstruction
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > Promising Solution (0.66)
Physically Plausible Neural Scene Reconstruction
We address the issue of physical implausibility in multi-view neural reconstruction. While implicit representations have gained popularity in multi-view 3D reconstruction, previous work struggles to yield physically plausible results, limiting their utility in domains requiring rigorous physical accuracy.
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Massachusetts (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (2 more...)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Coherent 3D Scene Diffusion From a Single RGB Image
We present a novel diffusion-based approach for coherent 3D scene reconstruction from a single RGB image. Our method utilizes an image-conditioned 3D scene diffusion model to simultaneously denoise the 3D poses and geometries of all objects within the scene.Motivated by the ill-posed nature of the task and to obtain consistent scene reconstruction results, we learn a generative scene prior by conditioning on all scene objects simultaneously to capture scene context and by allowing the model to learn inter-object relationships throughout the diffusion process.We further propose an efficient surface alignment loss to facilitate training even in the absence of full ground-truth annotation, which is common in publicly available datasets. This loss leverages an expressive shape representation, which enables direct point sampling from intermediate shape predictions.By framing the task of single RGB image 3D scene reconstruction as a conditional diffusion process, our approach surpasses current state-of-the-art methods, achieving a 12.04\% improvement in AP3D on SUN RGB-D and a 13.43\% increase in F-Score on Pix3D.
Inner-Outer Aware Reconstruction Model for Monocular 3D Scene Reconstruction
Monocular 3D scene reconstruction aims to reconstruct the 3D structure of scenes based on posed images. Recent volumetric-based methods directly predict the truncated signed distance function (TSDF) volume and have achieved promising results. The memory cost of volumetric-based methods will grow cubically as the volume size increases, so a coarse-to-fine strategy is necessary for saving memory. Specifically, the coarse-to-fine strategy distinguishes surface voxels from non-surface voxels, and only potential surface voxels are considered in the succeeding procedure. However, the non-surface voxels have various features, and in particular, the voxels on the inner side of the surface are quite different from those on the outer side since there exists an intrinsic gap between them.
Panoptic 3D Scene Reconstruction From a Single RGB Image
Richly segmented 3D scene reconstructions are an integral basis for many high-level scene understanding tasks, such as for robotics, motion planning, or augmented reality. Existing works in 3D perception from a single RGB image tend to focus on geometric reconstruction only, or geometric reconstruction with semantic segmentation or instance segmentation.Inspired by 2D panoptic segmentation, we propose to unify the tasks of geometric reconstruction, 3D semantic segmentation, and 3D instance segmentation into the task of panoptic 3D scene reconstruction -- from a single RGB image, predicting the complete geometric reconstruction of the scene in the camera frustum of the image, along with semantic and instance segmentations.We propose a new approach for holistic 3D scene understanding from a single RGB image which learns to lift and propagate 2D features from an input image to a 3D volumetric scene representation.Our panoptic 3D reconstruction metric evaluates both geometric reconstruction quality as well as panoptic segmentation.Our experiments demonstrate that our approach for panoptic 3D scene reconstruction outperforms alternative approaches for this task.
HybridWorldSim: A Scalable and Controllable High-fidelity Simulator for Autonomous Driving
Li, Qiang, Jiang, Yingwenqi, Li, Tuoxi, Chen, Duyu, Feng, Xiang, Ao, Yucheng, Liu, Shangyue, Yu, Xingchen, Cai, Youcheng, Liu, Yumeng, Ma, Yuexin, Hu, Xin, Liu, Li, Zhang, Yu, Xu, Linkun, Gao, Bingtao, Wang, Xueyuan, Zhou, Shuchang, Liu, Xianming, Liu, Ligang
Realistic and controllable simulation is critical for advancing end-to-end autonomous driving, yet existing approaches often struggle to support novel view synthesis under large viewpoint changes or to ensure geometric consistency. We introduce HybridWorldSim, a hybrid simulation framework that integrates multi-traversal neural reconstruction for static backgrounds with generative modeling for dynamic agents. This unified design addresses key limitations of previous methods, enabling the creation of diverse and high-fidelity driving scenarios with reliable visual and spatial consistency. To facilitate robust benchmarking, we further release a new multi-traversal dataset MIRROR that captures a wide range of routes and environmental conditions across different cities. Extensive experiments demonstrate that HybridWorldSim surpasses previous state-of-the-art methods, providing a practical and scalable solution for high-fidelity simulation and a valuable resource for research and development in autonomous driving.
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)
- Information Technology > Robotics & Automation (0.83)