CAESAR: An Embodied Simulator for Generating Multimodal Referring Expression Datasets

Open in new window