NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

Open in new window