Question Answering
Conversational Information Seeking
Zamani, Hamed, Trippas, Johanne R., Dalton, Jeff, Radlinski, Filip
Conversational information seeking (CIS) is concerned with a sequence of interactions between one or more users and an information system. Interactions in CIS are primarily based on natural language dialogue, while they may include other types of interactions, such as click, touch, and body gestures. This monograph provides a thorough overview of CIS definitions, applications, interactions, interfaces, design, implementation, and evaluation. This monograph views CIS applications as including conversational search, conversational question answering, and conversational recommendation. Our aim is to provide an overview of past research related to CIS, introduce the current state-of-the-art in CIS, highlight the challenges still being faced in the community. and suggest future directions.
Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering
Chen, Wenhu, Verga, Pat, de Jong, Michiel, Wieting, John, Cohen, William
Retrieval augmented language models have recently become the standard for knowledge intensive tasks. Rather than relying purely on latent semantics within the parameters of large neural models, these methods enlist a semi-parametric memory to encode an index of knowledge for the model to retrieve over. Most prior work has employed text passages as the unit of knowledge, which has high coverage at the cost of interpretability, controllability, and efficiency. The opposite properties arise in other methods which have instead relied on knowledge base (KB) facts. At the same time, more recent work has demonstrated the effectiveness of storing and retrieving from an index of Q-A pairs derived from text \citep{lewis2021paq}. This approach yields a high coverage knowledge representation that maintains KB-like properties due to its representations being more atomic units of information. In this work we push this line of research further by proposing a question-answer augmented encoder-decoder model and accompanying pretraining strategy. This yields an end-to-end system that not only outperforms prior QA retrieval methods on single-hop QA tasks but also enables compositional reasoning, as demonstrated by strong performance on two multi-hop QA datasets. Together, these methods improve the ability to interpret and control the model while narrowing the performance gap with passage retrieval systems.
ASQA: Factoid Questions Meet Long-Form Answers
Stelmakh, Ivan, Luan, Yi, Dhingra, Bhuwan, Chang, Ming-Wei
An abundance of datasets and availability of reliable evaluation metrics have resulted in strong progress in factoid question answering (QA). This progress, however, does not easily transfer to the task of long-form QA, where the goal is to answer questions that require in-depth explanations. The hurdles include (i) a lack of high-quality data, and (ii) the absence of a well-defined notion of the answer's quality. In this work, we address these problems by (i) releasing a novel dataset and a task that we call ASQA (Answer Summaries for Questions which are Ambiguous); and (ii) proposing a reliable metric for measuring performance on ASQA. Our task focuses on factoid questions that are ambiguous, that is, have different correct answers depending on interpretation. Answers to ambiguous questions should synthesize factual information from multiple sources into a long-form summary that resolves the ambiguity. In contrast to existing long-form QA tasks (such as ELI5), ASQA admits a clear notion of correctness: a user faced with a good summary should be able to answer different interpretations of the original ambiguous question. We use this notion of correctness to define an automated metric of performance for ASQA. Our analysis demonstrates an agreement between this metric and human judgments, and reveals a considerable gap between human performance and strong baselines.
New techniques in the field of Visual Question Answering part2(Machine Learning)
Abstract: We present a new pre-training method, Multimodal Inverse Cloze Task, for Knowledge-based Visual Question Answering about named Entities (KVQAE). KVQAE is a recently introduced task that consists in answering questions about named entities grounded in a visual context using a Knowledge Base. Therefore, the interaction between the modalities is paramount to retrieve information and must be captured with complex fusion models. As these models require a lot of training data, we design this pre-training task from existing work in textual Question Answering. It consists in considering a sentence as a pseudo-question and its context as a pseudo-relevant passage and is extended by considering images near texts in multimodal documents.
Curriculum Script Distillation for Multilingual Visual Question Answering
Chandu, Khyathi Raghavi, Geramifard, Alborz
Pre-trained models with dual and cross encoders have shown remarkable success in propelling the landscape of several tasks in vision and language in Visual Question Answering (VQA). However, since they are limited by the requirements of gold annotated data, most of these advancements do not see the light of day in other languages beyond English. We aim to address this problem by introducing a curriculum based on the source and target language translations to finetune the pre-trained models for the downstream task. Experimental results demonstrate that script plays a vital role in the performance of these models. Specifically, we show that target languages that share the same script perform better (~6%) than other languages and mixed-script code-switched languages perform better than their counterparts (~5-12%).
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images
Tanaka, Ryota, Nishida, Kyosuke, Nishida, Kosuke, Hasegawa, Taku, Saito, Itsumi, Saito, Kuniko
Visual question answering on document images that contain textual, visual, and layout information, called document VQA, has received much attention recently. Although many datasets have been proposed for developing document VQA systems, most of the existing datasets focus on understanding the content relationships within a single image and not across multiple images. In this study, we propose a new multi-image document VQA dataset, SlideVQA, containing 2.6k+ slide decks composed of 52k+ slide images and 14.5k questions about a slide deck. SlideVQA requires complex reasoning, including single-hop, multi-hop, and numerical reasoning, and also provides annotated arithmetic expressions of numerical answers for enhancing the ability of numerical reasoning. Moreover, we developed a new end-to-end document VQA model that treats evidence selection and question answering in a unified sequence-to-sequence format. Experiments on SlideVQA show that our model outperformed existing state-of-the-art QA models, but that it still has a large gap behind human performance. We believe that our dataset will facilitate research on document VQA.
Semantic Web Enabled Geographic Question Answering Framework: GeoTR
Tasar, Ceren Ocal, Komesli, Murat, Unalir, Murat Osman
With the considerable growth of linked data, researchers have focused on how to increase the availability of semantic web technologies to provide practical usages for real life systems. Question answering systems are an example of real-life systems that communicate directly with end users, understand user intention and generate answers. End users do not care about the structural query language or the vocabulary of the knowledge base where the point of a problem arises. In this study, a question answering framework that converts Turkish natural language input into SPARQL queries in the geographical domain is proposed. Additionally, a novel Turkish ontology, which covers a 10th grade geography lesson named Spatial Synthesis Turkey, has been developed to be used as a linked data provider. Moreover, a gap in the literature on Turkish question answering systems, which utilizes linked data in the geographical domain, is addressed. A hybrid system architecture that combines natural language processing techniques with linked data technologies to generate answers is also proposed. Further related research areas are suggested.
Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering
Lerner, Paul, Ferret, Olivier, Guinaudeau, Camille
We present a new pre-training method, Multimodal Inverse Cloze Task, for Knowledge-based Visual Question Answering about named Entities (KVQAE). KVQAE is a recently introduced task that consists in answering questions about named entities grounded in a visual context using a Knowledge Base. Therefore, the interaction between the modalities is paramount to retrieve information and must be captured with complex fusion models. As these models require a lot of training data, we design this pre-training task from existing work in textual Question Answering. It consists in considering a sentence as a pseudo-question and its context as a pseudo-relevant passage and is extended by considering images near texts in multimodal documents. Our method is applicable to different neural network architectures and leads to a 9% relative-MRR and 15% relative-F1 gain for retrieval and reading comprehension, respectively, over a no-pre-training baseline.
There is No Big Brother or Small Brother: Knowledge Infusion in Language Models for Link Prediction and Question Answering
Agarwal, Ankush, Gawade, Sakharam, Channabasavarajendra, Sachin, Bhattacharyya, Pushpak
The integration of knowledge graphs with deep learning is thriving in improving the performance of various natural language processing (NLP) tasks. In this paper, we focus on knowledge-infused link prediction and question answering using language models, T5, and BLOOM across three domains: Aviation, Movie, and Web. In this context, we infuse knowledge in large and small language models and study their performance, and find the performance to be similar. For the link prediction task on the Aviation Knowledge Graph, we obtain a 0.2 hits@1 score using T5-small, T5-base, T5-large, and BLOOM. Using template-based scripts, we create a set of 1 million synthetic factoid QA pairs in the aviation domain from National Transportation Safety Board (NTSB) reports. On our curated QA pairs, the three models of T5 achieve a 0.7 hits@1 score. We validate out findings with the paired student t-test and Cohen's kappa scores. For link prediction on Aviation Knowledge Graph using T5-small and T5-large, we obtain a Cohen's kappa score of 0.76, showing substantial agreement between the models. Thus, we infer that small language models perform similar to large language models with the infusion of knowledge.
A Brain-inspired Memory Transformation based Differentiable Neural Computer for Reasoning-based Question Answering
Liang, Yao, Fang, Hongjian, Zeng, Yi, Zhao, Feifei
Reasoning and question answering as a basic cognitive function for humans, is nevertheless a great challenge for current artificial intelligence. Although the Differentiable Neural Computer (DNC) model could solve such problems to a certain extent, the development is still limited by its high algorithm complexity, slow convergence speed, and poor test robustness. Inspired by the learning and memory mechanism of the brain, this paper proposed a Memory Transformation based Differentiable Neural Computer (MT-DNC) model. MT-DNC incorporates working memory and long-term memory into DNC, and realizes the autonomous transformation of acquired experience between working memory and long-term memory, thereby helping to effectively extract acquired knowledge to improve reasoning ability. Experimental results on bAbI question answering task demonstrated that our proposed method achieves superior performance and faster convergence speed compared to other existing DNN and DNC models. Ablation studies also indicated that the memory transformation from working memory to long-term memory plays essential role in improving the robustness and stability of reasoning. This work explores how brain-inspired memory transformation can be integrated and applied to complex intelligent dialogue and reasoning systems.