How Does the Textual Information Affect the Retrieval of Multimodal In-Context Learning?

Open in new window