Question Answering
Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing
Bajaj, Goonmeet, Bandyopadhyay, Bortik, Schmidt, Daniel, Maneriker, Pranav, Myers, Christopher, Parthasarathy, Srinivasan
Visual Question Answering (VQA) systems are tasked with answering natural language questions corresponding to a presented image. Current VQA datasets typically contain questions related to the spatial information of objects, object attributes, or general scene questions. Recently, researchers have recognized the need for improving the balance of such datasets to reduce the system's dependency on memorized linguistic features and statistical biases and to allow for improved visual understanding. However, it is unclear as to whether there are any latent patterns that can be used to quantify and explain these failures. To better quantify our understanding of the performance of VQA models, we use a taxonomy of Knowledge Gaps (KGs) to identify/tag questions with one or more types of KGs. Each KG describes the reasoning abilities needed to arrive at a resolution, and failure to resolve gaps indicate an absence of the required reasoning ability. After identifying KGs for each question, we examine the skew in the distribution of the number of questions for each KG. In order to reduce the skew in the distribution of questions across KGs, we introduce a targeted question generation model. This model allows us to generate new types of questions for an image.
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering
We introduce a novel approach to transformers that learns hierarchical representations in multiparty dialogue. First, three language modeling tasks are used to pre-train the transformers, token- and utterance-level language modeling and utterance order prediction, that learn both token and utterance embeddings for better understanding in dialogue contexts. Then, multi-task learning between the utterance prediction and the token span prediction is applied to fine-tune for span-based question answering (QA). Our approach is evaluated on the FriendsQA dataset and shows improvements of 3.8% and 1.4% over the two state-of-the-art transformer models, BERT and RoBERTa, respectively.
IBM Watson can answer all your coronavirus questions
In order to help government agencies, academic institutions and healthcare organizations handle the influx of calls and messages regarding the coronavirus, IBM has announced that it will provide a bundle of Watson services for free. The company will combine Watson Assistant, which uses IBM Research's natural language processing technology, with Watson Discovery to create IBM Watson Assistant for Citizens. The new Watson suite will be available online and on smartphones and will be free for at least 90 days. According to IBM, wait times for coronavirus-related questions are exceeding two hours, so the company believes that using AI via Watson may be able to help speed up response times. "While helping government agencies and healthcare institutions use AI to get critical information out to their citizens remains a high priority right now, the current environment has made it clear that every business in every industry should find ways to digitally engage with their clients and employees. With today's news, IBM is taking years of experience in helping thousands of global businesses and institutions use Natural Language Processing and other advanced AI technologies to better meet the demands of their constituents, and now applying it to the COVID-19 crisis. AI has the power to be your assistant during this uncertain time."
Knowledge Fusion and Semantic Knowledge Ranking for Open Domain Question Answering
Banerjee, Pratyay, Baral, Chitta
Open Domain Question Answering requires systems to retrieve external knowledge and perform multi-hop reasoning by composing knowledge spread over multiple sentences. In the recently introduced open domain question answering challenge datasets, QASC and OpenBookQA, we need to perform retrieval of facts and compose facts to correctly answer questions. In our work, we learn a semantic knowledge ranking model to re-rank knowledge retrieved through Lucene based information retrieval systems. We further propose a ``knowledge fusion model'' which leverages knowledge in BERT-based language models with externally retrieved knowledge and improves the knowledge understanding of the BERT-based language models. On both OpenBookQA and QASC datasets, the knowledge fusion model with semantically re-ranked knowledge outperforms previous attempts.
Talk to Papers: Bringing Neural Question Answering to Academic Search
We introduce Talk to Papers, which exploits the recent open-domain question answering (QA) techniques to improve the current experience of academic search. It's designed to enable researchers to use natural language queries to find precise answers and extract insights from a massive amount of academic papers. We present a large improvement over classic search engine baseline on several standard QA datasets and provide the community a collaborative data collection tool to curate the first natural language processing research QA dataset via a community effort.
Generating Rationales in Visual Question Answering
Ayyubi, Hammad A., Tanjim, Md. Mehrab, McAuley, Julian J., Cottrell, Garrison W.
Despite recent advances in Visual QuestionAnswering (VQA), it remains a challenge todetermine how much success can be attributedto sound reasoning and comprehension ability.We seek to investigate this question by propos-ing a new task ofrationale generation. Es-sentially, we task a VQA model with generat-ing rationales for the answers it predicts. Weuse data from the Visual Commonsense Rea-soning (VCR) task, as it contains ground-truthrationales along with visual questions and an-swers. We first investigate commonsense un-derstanding in one of the leading VCR mod-els, ViLBERT, by generating rationales frompretrained weights using a state-of-the-art lan-guage model, GPT-2. Next, we seek to jointlytrain ViLBERT with GPT-2 in an end-to-endfashion with the dual task of predicting the an-swer in VQA and generating rationales. Weshow that this kind of training injects com-monsense understanding in the VQA modelthrough quantitative and qualitative evaluationmetrics
R3: A Reading Comprehension Benchmark Requiring Reasoning Processes
Wang, Ran, Tao, Kun, Song, Dingjie, Zhang, Zhilong, Ma, Xiao, Su, Xi'ao, Dai, Xinyu
Existing question answering systems can only predict answers without explicit reasoning processes, which hinder their explainability and make us overestimate their ability of understanding and reasoning over natural language. In this work, we propose a novel task of reading comprehension, in which a model is required to provide final answers and reasoning processes. To this end, we introduce a formalism for reasoning over unstructured text, namely Text Reasoning Meaning Representation (TRMR). TRMR consists of three phrases, which is expressive enough to characterize the reasoning process to answer reading comprehension questions. We develop an annotation platform to facilitate TRMR's annotation, and release the R3 dataset, a \textbf{R}eading comprehension benchmark \textbf{R}equiring \textbf{R}easoning processes. R3 contains over 60K pairs of question-answer pairs and their TRMRs. Our dataset is available at: \url{http://anonymous}.
Former IBM Watson Team Leader David Ferrucci on AI and Elemental Cognition
Dr. David Ferrucci is one of the few people who have created a benchmark in the history of AI because when IBM Watson won Jeopardy we reached a milestone many thought impossible. I was very privileged to have Ferrucci on my podcast in early 2012 when we spent an hour on Watson's intricacies and importance. Well, it's been almost 8 years since our original conversation and it was time to catch up with David to talk about the things that have happened in the world of AI, the things that didn't happen but were supposed to, and our present and future in relation to Artificial Intelligence. All in all, I was super excited to have Ferrucci back on my podcast and hope you enjoy our conversation as much as I did. During this 90 min interview with David Ferffucci, we cover a variety of interesting topics such as: his perspective on IBM Watson; AI, hype and human cognition; benchmarks on the singularity timeline; his move away from IBM to the biggest hedge fund in the world; Elemental Cognition and its goals, mission and architecture; Noam Chomsky and Marvin Minsky's skepticism of Watson; deductive, inductive and abductive learning; leading and managing from the architecture down; Black Box vs Open Box AI; CLARA – Collaborative Learning and Reading Agent and the best and worst applications thereof; the importance of meaning and whether AI can be the source of it; whether AI is the greatest danger humanity is facing today; why technology is a magnifying mirror; why the world is transformed by asking questions.
NukeBERT: A Pre-trained language model for Low Resource Nuclear Domain
Jain, Ayush, Ganesamoorty, Meenachi
Significant advances have been made in recent years on Natural Language Processing with machines surpassing human performance in many tasks, including but not limited to Question Answering. The majority of deep learning methods for Question Answering targets domains with large datasets and highly matured literature. The area of Nuclear and Atomic energy has largely remained unexplored in exploiting non-annotated data for driving industry viable applications. Due to lack of dataset, a new dataset was created from the 7000 research papers on nuclear domain. This paper contributes to research in understanding nuclear domain knowledge which is then evaluated on Nuclear Question Answering Dataset (NQuAD) created by nuclear domain experts as part of this research. NQuAD contains 612 questions developed on 181 paragraphs randomly selected from the IGCAR research paper corpus. In this paper, the Nuclear Bidirectional Encoder Representational Transformers (NukeBERT) is proposed, which incorporates a novel technique for building BERT vocabulary to make it suitable for tasks with less training data. The experiments evaluated on NQuAD revealed that NukeBERT was able to outperform BERT significantly, thus validating the adopted methodology. Training NukeBERT is computationally expensive and hence we will be open-sourcing the NukeBERT pretrained weights and NQuAD for fostering further research work in the nuclear domain.
Self-Critical Reasoning for Robust Visual Question Answering
Visual Question Answering (VQA) deep-learning systems tend to capture superficial statistical correlations in the training data because of strong language priors and fail to generalize to test data with a significantly different question-answer (QA) distribution. To address this issue, we introduce a self-critical training objective that ensures that visual explanations of correct answers match the most influential image regions more than other competitive answer candidates. The influential regions are either determined from human visual/textual explanations or automatically from just significant words in the question and answer. We evaluate our approach on the VQA generalization task using the VQA-CP dataset, achieving a new state-of-the-art i.e. 49.5\% using textual explanations and 48.5\% using automatically Papers published at the Neural Information Processing Systems Conference.