Goto

Collaborating Authors

 Question Answering


Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering

arXiv.org Artificial Intelligence

Integrating outside knowledge for reasoning in visio-linguistic tasks such as visual question answering (VQA) is an open problem. Given that pretrained language models have been shown to include world knowledge, we propose to use a unimodal (text-only) train and inference procedure based on automatic off-the-shelf captioning of images and pretrained language models. Our results on a visual question answering task which requires external knowledge (OK-VQA) show that our text-only model outperforms pretrained multimodal (image-text) models of comparable number of parameters. In contrast, our model is less effective in a standard VQA task (VQA 2.0) confirming that our text-only method is specially effective for tasks requiring external knowledge. In addition, we show that our unimodal model is complementary to multimodal models in both OK-VQA and VQA 2.0, and yield the best result to date in OK-VQA among systems not using external knowledge graphs, and comparable to systems that do use them. Our qualitative analysis on OK-VQA reveals that automatic captions often fail to capture relevant information in the images, which seems to be balanced by the better inference ability of the text-only language models. Our work opens up possibilities to further improve inference in visio-linguistic tasks.


Topic Transferable Table Question Answering

arXiv.org Artificial Intelligence

Weakly-supervised table question-answering(TableQA) models have achieved state-of-art performance by using pre-trained BERT transformer to jointly encoding a question and a table to produce structured query for the question. However, in practical settings TableQA systems are deployed over table corpora having topic and word distributions quite distinct from BERT's pretraining corpus. In this work we simulate the practical topic shift scenario by designing novel challenge benchmarks WikiSQL-TS and WikiTQ-TS, consisting of train-dev-test splits in five distinct topic groups, based on the popular WikiSQL and WikiTableQuestions datasets. We empirically show that, despite pre-training on large open-domain text, performance of models degrades significantly when they are evaluated on unseen topics. In response, we propose T3QA (Topic Transferable Table Question Answering) a pragmatic adaptation framework for TableQA comprising of: (1) topic-specific vocabulary injection into BERT, (2) a novel text-to-text transformer generator (such as T5, GPT2) based natural language question generation pipeline focused on generating topic specific training data, and (3) a logical form reranker. We show that T3QA provides a reasonably good baseline for our topic shift benchmarks. We believe our topic split benchmarks will lead to robust TableQA solutions that are better suited for practical deployment.


Generating Self-Contained and Summary-Centric Question Answer Pairs via Differentiable Reward Imitation Learning

arXiv.org Artificial Intelligence

Motivated by suggested question generation in conversational news recommendation systems, we propose a model for generating question-answer pairs (QA pairs) with self-contained, summary-centric questions and length-constrained, article-summarizing answers. We begin by collecting a new dataset of news articles with questions as titles and pairing them with summaries of varying length. This dataset is used to learn a QA pair generation model producing summaries as answers that balance brevity with sufficiency jointly with their corresponding questions. We then reinforce the QA pair generation process with a differentiable reward function to mitigate exposure bias, a common problem in natural language generation. Both automatic metrics and human evaluation demonstrate these QA pairs successfully capture the central gists of the articles and achieve high answer accuracy.


FewshotQA: A simple framework for few-shot learning of question answering tasks using pre-trained text-to-text models

arXiv.org Artificial Intelligence

The task of learning from only a few examples (called a few-shot setting) is of key importance and relevance to a real-world setting. For question answering (QA), the current state-of-the-art pre-trained models typically need fine-tuning on tens of thousands of examples to obtain good results. Their performance degrades significantly in a few-shot setting (< 100 examples). To address this, we propose a simple fine-tuning framework that leverages pre-trained text-to-text models and is directly aligned with their pre-training framework. Specifically, we construct the input as a concatenation of the question, a mask token representing the answer span and a context. Given this input, the model is fine-tuned using the same objective as that of its pre-training objective. Through experimental studies on various few-shot configurations, we show that this formulation leads to significant gains on multiple QA benchmarks (an absolute gain of 34.2 F1 points on average when there are only 16 training examples). The gains extend further when used with larger models (Eg:- 72.3 F1 on SQuAD using BART-large with only 32 examples) and translate well to a multilingual setting . On the multilingual TydiQA benchmark, our model outperforms the XLM-Roberta-large by an absolute margin of upto 40 F1 points and an average of 33 F1 points in a few-shot setting (<= 64 training examples). We conduct detailed ablation studies to analyze factors contributing to these gains.


4 Ways IBM Watson's Artificial Intelligence Is Changing Healthcare

#artificialintelligence

Some say that artificial intelligence (AI) will radically change healthcare in the future. But that prediction overlooks an important detail: AI is already significantly changing healthcare. IBM (NYSE:IBM) Watson Health general manager Deborah DiSanzo spoke at the annual J. P. Morgan Healthcare Conference on Wednesday. She provided an update on the progress that IBM Watson, the AI system famous for beating Jeopardy! DiSanzo highlighted four areas where AI is making a big difference today.


Question Answering with Python, HuggingFace Transformers and Machine Learning – MachineCurve

#artificialintelligence

If you would like to read about DistilBERT in more detail I'd suggest clicking here for the article, but from what the abstract suggests it was made 60% faster by performing a 40% size reduction while retaining 97% of its language understanding. This is a significant improvement and a great optimization with respect to traditional or'vanilla' BERT. As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can then be fine-tuned with good performances on a wide range of tasks like its larger counterparts. While most prior work investigated the use of distillation for building task-specific models, we leverage knowledge distillation during the pre-training phase and show that it is possible to reduce the size of a BERT model by 40%, while retaining 97% of its language understanding capabilities and being 60% faster.


ArchivalQA: A Large-scale Benchmark Dataset for Open Domain Question Answering over Archival News Collections

arXiv.org Artificial Intelligence

In the last few years, open-domain question answering (ODQA) has advanced rapidly due to the development of deep learning techniques and the availability of large-scale QA datasets. However, the current datasets are essentially designed for synchronic document collections (e.g., Wikipedia). Temporal news collections such as long-term news archives spanning several decades, are rarely used in training the models despite they are quite valuable for our society. In order to foster the research in the field of ODQA on such historical collections, we present ArchivalQA, a large question answering dataset consisting of 1,067,056 question-answer pairs which is designed for temporal news QA. In addition, we create four subparts of our dataset based on the question difficulty levels and the containment of temporal expressions, which we believe could be useful for training or testing ODQA systems characterized by different strengths and abilities. The novel QA dataset-constructing framework that we introduce can be also applied to create datasets over other types of collections.


Biomedical Question Answering: A Survey of Approaches and Challenges

arXiv.org Artificial Intelligence

Professionals as well as the general public need effective assistance to access, understand and consume complex biomedical concepts. For example, doctors always want to be aware of up-to-date clinical evidence for the diagnosis and treatment of diseases under the scheme of Evidence-based Medicine [165], and the general public is becoming increasingly interested in learning about their own health conditions on the Internet [54]. Traditionally, Information Retrieval (IR) systems, such as PubMed, have been used to meet such information needs. However, classical IR is still not efficient enough [71, 77, 99, 164]. For instance, Russell-Rose and Chamberlain [164] reported that it requires 4 expert hours to answer complex medical queries using search engines. Compared with the retrieval systems that typically return a list of relevant documents for the users to read, Question Answering (QA) systems that provide direct answers to users' questions are more straightforward and intuitive. In general, QA itself is a challenging benchmark Natural Language Processing (NLP) task for evaluating the abilities of intelligent systems to understand a question, retrieve and utilize relevant materials and generate its answer. With the rapid development of computing hardware, modern QA models, especially those based on deep learning [30, 31, 42, 146, 171], achieve comparable or even better performance than human on many benchmark datasets [67, 83, 154, 155, 215] and have been successfully adopted in general domain search engines and conversational assistants [150, 236]. The Text REtrieval Conference (TREC) QA Track has triggered the modern QA research [197], when QA models were mostly based on IR.


It is AI's Turn to Ask Human a Question: Question and Answer Pair Generation for Children Storybooks in FairytaleQA Dataset

arXiv.org Artificial Intelligence

Existing question answering (QA) datasets are created mainly for the application of having AI to be able to answer questions asked by humans. But in educational applications, teachers and parents sometimes may not know what questions they should ask a child that can maximize their language learning results. With a newly released book QA dataset (FairytaleQA), which educational experts labeled on 46 fairytale storybooks for early childhood readers, we developed an automated QA generation model architecture for this novel application. Our model (1) extracts candidate answers from a given storybook passage through carefully designed heuristics based on a pedagogical framework; (2) generates appropriate questions corresponding to each extracted answer using a language model; and, (3) uses another QA model to rank top QA-pairs. Automatic and human evaluations show that our model outperforms baselines. We also demonstrate that our method can help with the scarcity issue of the children's book QA dataset via data augmentation on 200 unlabeled storybooks.


Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering

arXiv.org Artificial Intelligence

Spoken question answering (SQA) requires fine-grained understanding of both spoken documents and questions for the optimal answer prediction. In this paper, we propose novel training schemes for spoken question answering with a self-supervised training stage and a contrastive representation learning stage. In the self-supervised stage, we propose three auxiliary self-supervised tasks, including utterance restoration, utterance insertion, and question discrimination, and jointly train the model to capture consistency and coherence among speech documents without any additional data or annotations. We then propose to learn noise-invariant utterance representations in a contrastive objective by adopting multiple augmentation strategies, including span deletion and span substitution. Besides, we design a Temporal-Alignment attention to semantically align the speech-text clues in the learned common space and benefit the SQA tasks. By this means, the training schemes can more effectively guide the generation model to predict more proper answers. Experimental results show that our model achieves state-of-the-art results on three SQA benchmarks.