PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers
Lin, Weizhe, Mei, Jingbiao, Chen, Jinghong, Byrne, Bill
–arXiv.org Artificial Intelligence
Large Multimodal Models (LMMs) excel in natural language and visual understanding but are challenged by exacting tasks such as Knowledge-based Visual Question Answering (KB-VQA) which involve the retrieval of relevant information from document collections to use in shaping answers to questions. We present an extensive training and evaluation framework, M2KR, for KB-VQA. M2KR contains a collection of vision and language tasks which we have incorporated into a single suite of benchmark tasks for training and evaluating general-purpose multi-modal retrievers. We use M2KR to develop PreFLMR, a pre-trained version of the recently developed Fine-grained Late-interaction Multi-modal Retriever (FLMR) approach to KB-VQA, and we report new state-of-the-art results across a range of tasks. We also present investigations into the scaling behaviors of PreFLMR intended to be useful in future developments in general-purpose multi-modal retrievers.
arXiv.org Artificial Intelligence
Feb-13-2024
- Country:
- Oceania > Australia
- Victoria > Melbourne (0.04)
- New South Wales > Sydney (0.04)
- North America
- Dominican Republic (0.04)
- United States
- Massachusetts (0.04)
- Washington > King County
- Seattle (0.04)
- New York > New York County
- New York City (0.04)
- California > San Diego County
- San Diego (0.04)
- Canada
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- Netherlands > South Holland
- Leiden (0.04)
- Italy > Tuscany
- Florence (0.04)
- United Kingdom > England
- Asia
- Uzbekistan (0.04)
- Singapore (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Oceania > Australia
- Genre:
- Research Report (1.00)
- Industry:
- Leisure & Entertainment (0.46)
- Technology: