MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans
Yu, Huangyue, Jia, Baoxiong, Chen, Yixin, Yang, Yandan, Li, Puhao, Su, Rongpeng, Li, Jiaxin, Li, Qing, Liang, Wei, Zhu, Song-Chun, Liu, Tengyu, Huang, Siyuan
–arXiv.org Artificial Intelligence
Embodied AI (EAI) research requires high-quality, diverse 3D scenes to effectively support skill acquisition, sim-to-real transfer, and generalization. Achieving these quality standards, however, necessitates the precise replication of real-world object diversity. Existing datasets demonstrate that this process heavily relies on artist-driven designs, which demand substantial human effort and present significant scalability challenges. To scalably produce realistic and interactive 3D scenes, we first present MetaScenes, a large-scale, simulatable 3D scene dataset constructed from real-world scans, which includes 15366 objects spanning 831 fine-grained categories. Then, we introduce Scan2Sim, a robust multi-modal alignment model, which enables the automated, high-quality replacement of assets, thereby eliminating the reliance on artist-driven designs for scaling 3D scenes. We further propose two benchmarks to evaluate MetaScenes: a detailed scene synthesis task focused on small item layouts for robotic manipulation and a domain transfer task in vision-and-language navigation (VLN) to validate cross-domain transfer. Results confirm MetaScene's potential to enhance EAI by supporting more generalizable agent learning and sim-to-real applications, introducing new possibilities for EAI research. Project website: https://meta-scenes.github.io/.
arXiv.org Artificial Intelligence
May-6-2025
- Country:
- Asia
- China > Beijing
- Beijing (0.04)
- Middle East > Israel
- Mediterranean Sea (0.04)
- China > Beijing
- Asia
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Education (0.48)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning (1.00)
- Natural Language (1.00)
- Representation & Reasoning > Object-Oriented Architecture (0.68)
- Robots (1.00)
- Vision (1.00)
- Sensing and Signal Processing > Image Processing (0.67)
- Artificial Intelligence
- Information Technology