Question Answering
CogME: A Novel Evaluation Metric for Video Understanding Intelligence
Shin, Minjung, Kim, Jeonghoon, Choi, Seongho, Heo, Yu-Jung, Kim, Donghyun, Lee, Minsu, Zhang, Byoung-Tak, Ryu, Jeh-Kwang
Developing video understanding intelligence is quite challenging because it requires holistic integration of images, scripts, and sounds based on natural language processing, temporal dependency, and reasoning. Recently, substantial attempts have been made on several video datasets with associated question answering (QA) on a large scale. However, existing evaluation metrics for video question answering (VideoQA) do not provide meaningful analysis. To make progress, we argue that a well-made framework, established on the way humans understand, is required to explain and evaluate the performance of understanding in detail. Then we propose a top-down evaluation system for VideoQA, based on the cognitive process of humans and story elements: Cognitive Modules for Evaluation (CogME). CogME is composed of three cognitive modules: targets, contents, and thinking. The interaction among the modules in the understanding procedure can be expressed in one sentence as follows: "I understand the CONTENT of the TARGET through a way of THINKING." Each module has sub-components derived from the story elements. We can specify the required aspects of understanding by annotating the sub-components to individual questions. CogME thus provides a framework for an elaborated specification of VideoQA datasets. To examine the suitability of a VideoQA dataset for validating video understanding intelligence, we evaluated the baseline model of the DramaQA dataset by applying CogME. The evaluation reveals that story elements are unevenly reflected in the existing dataset, and the model based on the dataset may cause biased predictions. Although this study has only been able to grasp a narrow range of stories, we expect that it offers the first step in considering the cognitive process of humans on the video understanding intelligence of humans and AI.
Zero-shot Visual Question Answering using Knowledge Graph
Chen, Zhuo, Chen, Jiaoyan, Geng, Yuxia, Pan, Jeff Z., Yuan, Zonggang, Chen, Huajun
Incorporating external knowledge to Visual Question Answering (VQA) has become a vital practical need. Existing methods mostly adopt pipeline approaches with different components for knowledge matching and extraction, feature learning, etc.However, such pipeline approaches suffer when some component does not perform well, which leads to error propagation and poor overall performance. Furthermore, the majority of existing approaches ignore the answer bias issue -- many answers may have never appeared during training (i.e., unseen answers) in real-word application. To bridge these gaps, in this paper, we propose a Zero-shot VQA algorithm using knowledge graphs and a mask-based learning mechanism for better incorporating external knowledge, and present new answer-based Zero-shot VQA splits for the F-VQA dataset. Experiments show that our method can achieve state-of-the-art performance in Zero-shot VQA with unseen answers, meanwhile dramatically augment existing end-to-end models on the normal F-VQA task.
Reasoning with Language Models and Knowledge Graphs for Question Answering
From search engines to personal assistants, we use question-answering systems every day. When we ask a question ("Where was the painter of the Mona Lisa born?"), the system needs to gather background knowledge ("The Mona Lisa was painted by Leonardo da Vinci", "Leonardo da Vinci was born in Italy") and reason over it to produce the answer ("Italy"). Knowledge sources In recent AI research, such background knowledge is commonly available in the forms of knowledge graphs (KGs) and language models (LMs) pre-trained on a large set of documents. In KGs, entities are represented as nodes and relations between them as edges, e.g. Examples of KGs include Freebase (general-purpose facts)1, ConceptNet (commonsense)2, and UMLS (biomedical facts)3.
DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering
Wang, Jianyu, Bao, Bing-Kun, Xu, Changsheng
Video question answering is a challenging task, which requires agents to be able to understand rich video contents and perform spatial-temporal reasoning. However, existing graph-based methods fail to perform multi-step reasoning well, neglecting two properties of VideoQA: (1) Even for the same video, different questions may require different amount of video clips or objects to infer the answer with relational reasoning; (2) During reasoning, appearance and motion features have complicated interdependence which are correlated and complementary to each other. Based on these observations, we propose a Dual-Visual Graph Reasoning Unit (DualVGR) which reasons over videos in an end-to-end fashion. The first contribution of our DualVGR is the design of an explainable Query Punishment Module, which can filter out irrelevant visual features through multiple cycles of reasoning. The second contribution is the proposed Video-based Multi-view Graph Attention Network, which captures the relations between appearance and motion features. Our DualVGR network achieves state-of-the-art performance on the benchmark MSVD-QA and SVQA datasets, and demonstrates competitive results on benchmark MSRVTT-QA datasets. Our code is available at https://github.com/MMIR/DualVGR-VideoQA.
Ask Wikipedia ELI5-like Questions Using Long-Form Question Answering on Haystack
Recent advancements in NLP question answering (QA)-based systems have been astonishing. QA systems built on top of the most recent language models (BERT, RoBERTa, etc.) can answer factoid-based questions with relative ease and excellent precision. The task involves finding the relevant document passages containing the answer and extracting the answer by scanning the correct word token span. More challenging QA systems engage with so-called "generative question answering". These systems focus on handling questions where the provided context passages are not simply the source tokens for extracted answers, but provide the larger context to synthesize original answers.
More Play and Less Prep: Flamel.AI Automates Role-Playing Games with IBM Watson
Alex Migitko started playing tabletop role-playing games (RPGs) 15 years ago. But as life got more demanding, he couldn't commit to the time needed for preparation and play, both as a game facilitator and player. Though passionate about gaming, he ultimately stopped. These "aging out" stories are all too common. Players fall in love with gaming because it provides such depth and breadth of creativity and escape.
Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering
Karamcheti, Siddharth, Krishna, Ranjay, Fei-Fei, Li, Manning, Christopher D.
Active learning promises to alleviate the massive data needs of supervised machine learning: it has successfully improved sample efficiency by an order of magnitude on traditional tasks like topic classification and object recognition. However, we uncover a striking contrast to this promise: across 5 models and 4 datasets on the task of visual question answering, a wide variety of active learning approaches fail to outperform random selection. To understand this discrepancy, we profile 8 active learning methods on a per-example basis, and identify the problem as collective outliers -- groups of examples that active learning methods prefer to acquire but models fail to learn (e.g., questions that ask about text in images or require external knowledge). Through systematic ablation experiments and qualitative visualizations, we verify that collective outliers are a general phenomenon responsible for degrading pool-based active learning. Notably, we show that active learning sample efficiency increases significantly as the number of collective outliers in the active learning pool decreases. We conclude with a discussion and prescriptive recommendations for mitigating the effects of these outliers in future work.
Training Adaptive Computation for Open-Domain Question Answering with Computational Constraints
Wu, Yuxiang, Minervini, Pasquale, Stenetorp, Pontus, Riedel, Sebastian
Adaptive Computation (AC) has been shown to be effective in improving the efficiency of Open-Domain Question Answering (ODQA) systems. However, current AC approaches require tuning of all model parameters, and training state-of-the-art ODQA models requires significant computational resources that may not be available for most researchers. We propose Adaptive Passage Encoder, an AC method that can be applied to an existing ODQA model and can be trained efficiently on a single GPU. It keeps the parameters of the base ODQA model fixed, but it overrides the default layer-by-layer computation of the encoder with an AC policy that is trained to optimise the computational efficiency of the model. Our experimental results show that our method improves upon a state-of-the-art model on two datasets, and is also more accurate than previous AC methods due to the stronger base ODQA model. All source code and datasets are available at https://github.com/uclnlp/APE.
Overview of BioASQ 2021: The ninth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering
Nentidis, Anastasios, Katsimpras, Georgios, Vandorou, Eirini, Krithara, Anastasia, Gasco, Luis, Krallinger, Martin, Paliouras, Georgios
Advancing the state-of-the-art in large-scale biomedical semantic indexing and question answering is the main focus of the BioASQ challenge. BioASQ organizes respective tasks where different teams develop systems that are evaluated on the same benchmark datasets that represent the real information needs of experts in the biomedical domain. This paper presents an overview of the ninth edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2021. In this year, a new question answering task, named Synergy, is introduced to support researchers studying the COVID-19 disease and measure the ability of the participating teams to discern information while the problem is still developing. In total, 42 teams with more than 170 systems were registered to participate in the four tasks of the challenge. The evaluation results, similarly to previous years, show a performance gain against the baselines which indicates the continuous improvement of the state-of-the-art in this field.
Overview of BioASQ 2020: The eighth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering
Nentidis, Anastasios, Krithara, Anastasia, Bougiatiotis, Konstantinos, Krallinger, Martin, Rodriguez-Penagos, Carlos, Villegas, Marta, Paliouras, Georgios
In this paper, we present an overview of the eighth edition of the BioASQ challenge, which ran as a lab in the Conference and Labs of the Evaluation Forum (CLEF) 2020. BioASQ is a series of challenges aiming at the promotion of systems and methodologies for large-scale biomedical semantic indexing and question answering. To this end, shared tasks are organized yearly since 2012, where different teams develop systems that compete on the same demanding benchmark datasets that represent the real information needs of experts in the biomedical domain. This year, the challenge has been extended with the introduction of a new task on medical semantic indexing in Spanish. In total, 34 teams with more than 100 systems participated in the three tasks of the challenge. As in previous years, the results of the evaluation reveal that the top-performing systems managed to outperform the strong baselines, which suggests that state-of-the-art systems keep pushing the frontier of research through continuous improvements.