VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

Open in new window