Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts

Open in new window