Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering

Open in new window