Leveraging Large (Visual) Language Models for Robot 3D Scene Understanding

Open in new window