Goto

Collaborating Authors

 Question Answering


Cross-Lingual Question Answering over Knowledge Base as Reading Comprehension

arXiv.org Artificial Intelligence

Although many large-scale knowledge bases (KBs) claim to contain multilingual information, their support for many non-English languages is often incomplete. This incompleteness gives birth to the task of cross-lingual question answering over knowledge base (xKBQA), which aims to answer questions in languages different from that of the provided KB. One of the major challenges facing xKBQA is the high cost of data annotation, leading to limited resources available for further exploration. Another challenge is mapping KB schemas and natural language expressions in the questions under cross-lingual settings. In this paper, we propose a novel approach for xKBQA in a reading comprehension paradigm. We convert KB subgraphs into passages to narrow the gap between KB schemas and questions, which enables our model to benefit from recent advances in multilingual pre-trained language models (MPLMs) and cross-lingual machine reading comprehension (xMRC). Specifically, we use MPLMs, with considerable knowledge of cross-lingual mappings, for cross-lingual reading comprehension. Existing high-quality xMRC datasets can be further utilized to finetune our model, greatly alleviating the data scarcity issue in xKBQA. Extensive experiments on two xKBQA datasets in 12 languages show that our approach outperforms various baselines and achieves strong few-shot and zero-shot performance. Our dataset and code are released for further research.


Medical visual question answering using joint self-supervised learning

arXiv.org Artificial Intelligence

Visual Question Answering (VQA) becomes one of the most active research problems in the medical imaging domain. A well-known VQA challenge is the intrinsic diversity between the image and text modalities, and in the medical VQA task, there is another critical problem relying on the limited size of labelled image-question-answer data. In this study we propose an encoder-decoder framework that leverages the image-text joint representation learned from large-scaled medical image-caption data and adapted to the small-sized medical VQA task. The encoder embeds across the image-text dual modalities with self-attention mechanism and is independently pre-trained on the large-scaled medical image-caption dataset by multiple self-supervised learning tasks. Then the decoder is connected to the top of the encoder and fine-tuned using the small-sized medical VQA dataset. The experiment results present that our proposed method achieves better performance comparing with the baseline and SOTA methods.


Incorporating Question Answering-Based Signals into Abstractive Summarization via Salient Span Selection

arXiv.org Artificial Intelligence

In this work, we propose a method for incorporating question-answering (QA) signals into a summarization model. Our method identifies salient noun phrases (NPs) in the input document by automatically generating wh-questions that are answered by the NPs and automatically determining whether those questions are answered in the gold summaries. This QA-based signal is incorporated into a two-stage summarization model which first marks salient NPs in the input document using a classification model, then conditionally generates a summary. Our experiments demonstrate that the models trained using QA-based supervision generate higher-quality summaries than baseline methods of identifying salient spans on benchmark summarization datasets. Further, we show that the content of the generated summaries can be controlled based on which NPs are marked in the input document. Finally, we propose a method of augmenting the training data so the gold summaries are more consistent with the marked input spans used during training and show how this results in models which learn to better exclude unmarked document content.


Coarse-to-Fine Knowledge Selection for Document Grounded Dialogs

arXiv.org Artificial Intelligence

Multi-document grounded dialogue systems (DGDS) belong to a class of conversational agents that answer users' requests by finding supporting knowledge from a collection of documents. Most previous studies aim to improve the knowledge retrieval model or propose more effective ways to incorporate external knowledge into a parametric generation model. These methods, however, focus on retrieving knowledge from mono-granularity language units (e.g. passages, sentences, or spans in documents), which is not enough to effectively and efficiently capture precise knowledge in long documents. This paper proposes Re3G, which aims to optimize both coarse-grained knowledge retrieval and fine-grained knowledge extraction in a unified framework. Specifically, the former efficiently finds relevant passages in a retrieval-and-reranking process, whereas the latter effectively extracts finer-grain spans within those passages to incorporate into a parametric answer generation model (BART, T5). Experiments on DialDoc Shared Task demonstrate the effectiveness of our method.


Real-World Deployment and Evaluation of Kwame for Science, An AI Teaching Assistant for Science Education in West Africa

arXiv.org Artificial Intelligence

Africa has a high student-to-teacher ratio which limits students' access to teachers for learning support such as educational question answering. In this work, we extended Kwame, our previous AI teaching assistant for coding education, adapted it for science education, and deployed it as a web app. Kwame for Science provides passages from well-curated knowledge sources and related past national exam questions as answers to questions from students based on the Integrated Science subject of the West African Senior Secondary Certificate Examination (WASSCE). Furthermore, students can view past national exam questions along with their answers and filter by year, question type (objectives, theory, and practicals), and topics that were automatically categorized by a topic detection model which we developed (91% unweighted average recall). We deployed Kwame for Science in the real world over 8 months and had 750 users across 32 countries (15 in Africa) and 1.5K questions asked. Our evaluation showed an 87.2% top 3 accuracy (n=109 questions) implying that Kwame for Science has a high chance of giving at least one useful answer among the 3 displayed. We categorized the reasons the model incorrectly answered questions to provide insights for future improvements. We also share challenges and lessons with the development, deployment, and human-computer interaction component of such a tool to enable other researchers to deploy similar tools. With a first-of-its-kind tool within the African context, Kwame for Science has the potential to enable the delivery of scalable, cost-effective, and quality remote education to millions of people across Africa.


Foundation Models for Natural Language Processing -- Pre-trained Language Models Integrating Media

arXiv.org Artificial Intelligence

This open access book provides a comprehensive overview of the state of the art in research and applications of Foundation Models and is intended for readers familiar with basic Natural Language Processing (NLP) concepts. Over the recent years, a revolutionary new paradigm has been developed for training models for NLP. These models are first pre-trained on large collections of text documents to acquire general syntactic knowledge and semantic information. Then, they are fine-tuned for specific tasks, which they can often solve with superhuman accuracy. When the models are large enough, they can be instructed by prompts to solve new tasks without any fine-tuning. Moreover, they can be applied to a wide range of different media and problem domains, ranging from image and video processing to robot control learning. Because they provide a blueprint for solving many tasks in artificial intelligence, they have been called Foundation Models. After a brief introduction to basic NLP models the main pre-trained language models BERT, GPT and sequence-to-sequence transformer are described, as well as the concepts of self-attention and context-sensitive embedding. Then, different approaches to improving these models are discussed, such as expanding the pre-training criteria, increasing the length of input texts, or including extra knowledge. An overview of the best-performing models for about twenty application areas is then presented, e.g., question answering, translation, story generation, dialog systems, generating images from text, etc. For each application area, the strengths and weaknesses of current models are discussed, and an outlook on further developments is given. In addition, links are provided to freely available program code. A concluding chapter summarizes the economic opportunities, mitigation of risks, and potential developments of AI.


Bridge the Gap between Language models and Tabular Understanding

arXiv.org Artificial Intelligence

Table pretrain-then-finetune paradigm has been proposed and employed at a rapid pace after the success of pre-training in the natural language domain. Despite the promising findings in tabular pre-trained language models (TPLMs), there is an input gap between pre-training and fine-tuning phases. For instance, TPLMs jointly pre-trained with table and text input could be effective for tasks also with table-text joint input like table question answering, but it may fail for tasks with only tables or text as input such as table retrieval. To this end, we propose UTP, an approach that dynamically supports three types of multi-modal inputs: table-text, table, and text. Specifically, UTP is pre-trained with two strategies: (1) We first utilize a universal mask language modeling objective on each kind of input, enforcing the model to adapt various inputs. (2) We then present Cross-Modal Contrastive Regularization (CMCR), which utilizes contrastive learning to encourage the consistency between table-text cross-modality representations via unsupervised instance-wise training signals during pre-training. By these means, the resulting model not only bridges the input gap between pre-training and fine-tuning but also advances in the alignment of table and text. Extensive results show UTP achieves superior results on uni-modal input tasks (e.g., table retrieval) and cross-modal input tasks (e.g., table question answering).


A Survey of Multi-task Learning in Natural Language Processing: Regarding Task Relatedness and Training Methods

arXiv.org Artificial Intelligence

By focusing on one such two "how to share" categories into task, the model ignores knowledge from the training five categories, including feature learning approach, signals of related tasks (Ruder, 2017). There low-rank approach, task clustering approach, task are a great number of tasks in NLP, from syntax relation learning approach, and decomposition approach; parsing to information extraction, from machine Crawshaw (2020) presented more recent translation to question answering: each requires models in both single-domain and multi-modal architectures, a model dedicated to learning from data. Biologically, as well as an overview of optimization humans learn natural languages, from basic methods in MTL. Nevertheless, it is still not clearly grammar to complex semantics in a single brain understood how to design and train a single model (Hashimoto et al., 2017). In the field of machine to handle a variety of NLP tasks according to task learning, multi-task learning (MTL) aims to leverage relatedness. Especially when faced with a set of useful information shared across multiple related tasks that are seldom simultaneously trained previously, tasks to improve the generalization performance it is of crucial importance that researchers on all tasks (Caruana, 1997). In deep neural find proper auxiliary tasks and assess the feasibility networks, it is generally achieved by sharing part of of such multi-task learning attempt.


The Question Answering concept part2 (Machine Learning)

#artificialintelligence

Abstract: This paper presents a new method for end-to-end Video Question Answering (VideoQA), aside from the current popularity of using large-scale pre-training with huge feature extractors. We achieve this with a pyramidal multimodal transformer (PMT) model, which simply incorporates a learnable word embedding layer, a few convolutional and transformer layers. We use the anisotropic pyramid to fulfill video-language interactions across different spatio-temporal scales. In addition to the canonical pyramid, which includes both bottom-up and top-down pathways with lateral connections, novel strategies are proposed to decompose the visual feature stream into spatial and temporal sub-streams at different scales and implement their interactions with the linguistic semantics while preserving the integrity of local and global semantics. We demonstrate better or on-par performances with high computational efficiency against state-of-the-art methods on five VideoQA benchmarks.


Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering

arXiv.org Artificial Intelligence

The field of visual question answering (VQA) has recently seen a surge in research focused on providing explanations for predicted answers. However, current systems mostly rely on separate models to predict answers and generate explanations, leading to less grounded and frequently inconsistent results. To address this, we propose a multitask learning approach towards a Unified Model for Answer and Explanation generation (UMAE). Our approach involves the addition of artificial prompt tokens to training data and fine-tuning a multimodal encoder-decoder model on a variety of VQA-related tasks. In our experiments, UMAE models surpass the prior state-of-the-art answer accuracy on A-OKVQA by 10~15%, show competitive results on OK-VQA, achieve new state-of-the-art explanation scores on A-OKVQA and VCR, and demonstrate promising out-of-domain performance on VQA-X.