SD-OVON: A Semantics-aware Dataset and Benchmark Generation Pipeline for Open-Vocabulary Object Navigation in Dynamic Scenes
Qiu, Dicong, You, Jiadi, Gong, Zeying, Qiu, Ronghe, Xiong, Hui, Liang, Junwei
–arXiv.org Artificial Intelligence
We present the Semantics-aware Dataset and Benchmark Generation Pipeline for Open-vocabulary Object Navigation in Dynamic Scenes (SD-OVON). It utilizes pretraining multimodal foundation models to generate infinite unique photo-realistic scene variants that adhere to real-world semantics and daily commonsense for the training and the evaluation of navigation agents, accompanied with a plugin for generating object navigation task episodes compatible to the Habitat simulator. In addition, we offer two pre-generated object navigation task datasets, SD-OVON-3k and SD-OVON-10k, comprising respectively about 3k and 10k episodes of the open-vocabulary object navigation task, derived from the SD-OVON-Scenes dataset with 2.5k photo-realistic scans of real-world environments and the SD-OVON-Objects dataset with 0.9k manually inspected scanned and artist-created manipulatable object models. Unlike prior datasets limited to static environments, SD-OVON covers dynamic scenes and manipulatable objects, facilitating both real-to-sim and sim-to-real robotic applications. This approach enhances the realism of navigation tasks, the training and the evaluation of open-vocabulary object navigation agents in complex settings. To demonstrate the effectiveness of our pipeline and datasets, we propose two baselines and evaluate them along with state-of-the-art baselines on SD-OVON-3k. The datasets, benchmark and source code are publicly available.
arXiv.org Artificial Intelligence
May-27-2025
- Country:
- Asia > China
- Guangdong Province > Guangzhou (0.04)
- Hong Kong (0.04)
- Asia > China
- Genre:
- Research Report (0.40)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language > Large Language Model (0.95)
- Representation & Reasoning (1.00)
- Robots (0.89)
- Information Technology > Artificial Intelligence