Question Answering
From Visual Question Answering to multimodal learning: an interview with Aishwarya Agrawal
You were awarded an Honourable Mention for the 2019 AAAI / ACM SIGAI Doctoral Dissertation Award. What was the topic of your dissertation research, and what were the main contributions or findings? My PhD dissertation was on the topic of Visual Question Answering, called VQA. We proposed the task of open-ended and free-form VQA - a new way to benchmark computer vision models by asking them questions about images. We curated a large-scale dataset for researchers to train and test their models on this task.
Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering (Appendix)
We chose the Google Search corpus [Luo et al., 2021] for our question-answering system as it provides good coverage of the knowledge needed and is publicly available. Therefore, it is advised to conduct an ethical review prior to deploying the system in live service. Table 1 shows the data statistics of the OK-VQA dataset. We build a DPR retriever as a baseline for FLMR. Equally contributed as the first author 37th Conference on Neural Information Processing Systems (NeurIPS 2023). The inner product search (supported by FAISS [Johnson et al., 2019]) is used to train and In answer generation, we use t5-large and Salesforce/blip2-flan-t5-xl.
SQALER: Scaling Question Answering by Decoupling Multi-Hop and Logical Reasoning -- Appendix
The knowledge seeking procedure described in Section 2.1 applies a search algorithm over the graph Each of such queries takes constant time. As mentioned in Section 2.3, the approach described in this paper can be used to answer any valid We proceed by induction on the number of literals |Q |. 3 Base case. For the experiments on KBQA, we assume that we only have access to pairs of questions and answers, i.e. the actual inferential chain leading from the question to the answer is latent. Therefore, we resort to weak supervision to train the model. Inspired by such insight, we employ a similar technique to enhance the performance of our model.