Goto

Collaborating Authors

 Question Answering


Reviews: Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Neural Information Processing Systems

This paper studies the problem of handling the langauge/text pariors in the task visual question answering. The great performance achieved by many state-of-the-art VQA systems are accomplished by heavily learning a better question encoding to better capture the correlations between the questions and answers, but ignore the image information. So the problem is important to the VQA research community. In general, the paper is well-written and easy to follow. And some concerns and sugggestions can be found as the following: 1) The major concern is the basic intuition of the question-only adversary: The question encoding q_i from the question encoder is not necessarily the same bias that lead the VQA model f to ignore the visual content. Since f can be a deep neutral network, for example, deep RNN or deep RNN-CNN to leverage both the question embedding and visual embedding, thus the non-linearity in f would make the question embedding as a image-aware represention to generate the answer distribution.


Reviews: Chain of Reasoning for Visual Question Answering

Neural Information Processing Systems

Paper Summary: This paper presented a novel approach that performs chain of reasonings on the object level to generate answer for visual question answering. Object-level visual embeddings are first extracted through object detection networks as visual representation and sentence embedding of the question are extract question representation. Based on these, a sequential model that performs multi-steps of relational inference over (compound) object embeddings with the guidance of question is used to obtain the final representation for each sub-chain inference. A concatenation of these embeddings are then used to perform answer classification. Extensive experiments have been conducted on four public datasets and it achieves state-of-the-art performance on all of them.


Reviews: Learning to Specialize with Knowledge Distillation for Visual Question Answering

Neural Information Processing Systems

For example, one model might be specialized for'what color is the umbrella?' and another for'how many people are wearing glasses?' while at test time they question may be'what color are the glasses?'. Specifically, they train independently ensembled base VQA models on the entire dataset, and then while training using MCL, subset of models are trained using oracle assignments (as in usual MCL) while the rest are trained to imitate the base models' activations. Strengths -- The paper is very nicely written. It starts with a clear description of the problem, the observations made by the authors, and then the proposed solution -- positioning it appropriately with respect to prior work -- and then experiments. Given the small dataset, MCL and CMCL perform worse than independent ensembling, while MCL-KD performs better.


Document-level Causal Relation Extraction with Knowledge-guided Binary Question Answering

arXiv.org Artificial Intelligence

As an essential task in information extraction (IE), Event-Event Causal Relation Extraction (ECRE) aims to identify and classify the causal relationships between event mentions in natural language texts. However, existing research on ECRE has highlighted two critical challenges, including the lack of document-level modeling and causal hallucinations. In this paper, we propose a Knowledge-guided binary Question Answering (KnowQA) method with event structures for ECRE, consisting of two stages: Event Structure Construction and Binary Question Answering. We conduct extensive experiments under both zero-shot and fine-tuning settings with large language models (LLMs) on the MECI and MAVEN-ERE datasets. Experimental results demonstrate the usefulness of event structures on document-level ECRE and the effectiveness of KnowQA by achieving state-of-the-art on the MECI dataset. We observe not only the effectiveness but also the high generalizability and low inconsistency of our method, particularly when with complete event structures after fine-tuning the models.


A Russian Jeopardy! Data Set for Question-Answering Systems

arXiv.org Artificial Intelligence

Question answering (QA) is one of the most common NLP tasks that relates to named entity recognition, fact extraction, semantic search and some other fields. In industry, it is much appreciated in chatbots and corporate information systems. It is also a challenging task that attracted the attention of a very general audience at the quiz show Jeopardy! In this article we describe a Jeopardy!-like Russian QA data set collected from the official Russian quiz database Chgk (che ge ka). The data set includes 379,284 quiz-like questions with 29,375 from the Russian analogue of Jeopardy! - "Own Game". We observe its linguistic features and the related QA-task. We conclude about perspectives of a QA competition based on the data set collected from this database.


Overview of Factify5WQA: Fact Verification through 5W Question-Answering

arXiv.org Artificial Intelligence

Researchers have found that fake news spreads much times faster than real news [1]. This is a major problem, especially in today's world where social media is the key source of news for many among the younger population. Fact verification, thus, becomes an important task and many media sites contribute to the cause. Manual fact verification is a tedious task, given the volume of fake news online. The Factify5WQA shared task aims to increase research towards automated fake news detection by providing a dataset with an aspect-based question answering based fact verification method. Each claim and its supporting document is associated with 5W questions that help compare the two information sources. The objective performance measure in the task is done by comparing answers using BLEU score to measure the accuracy of the answers, followed by an accuracy measure of the classification. The task had submissions using custom training setup and pre-trained language-models among others. The best performing team posted an accuracy of 69.56%, which is a near 35% improvement over the baseline.


Question-Answering System for Bangla: Fine-tuning BERT-Bangla for a Closed Domain

arXiv.org Artificial Intelligence

Question-answering systems for Bengali have seen limited development, particularly in domain-specific applications. Leveraging advancements in natural language processing, this paper explores a fine-tuned BERT-Bangla model to address this gap. It presents the development of a question-answering system for Bengali using a fine-tuned BERT-Bangla model in a closed domain. The dataset was sourced from Khulna University of Engineering \& Technology's (KUET) website and other relevant texts. The system was trained and evaluated with 2500 question-answer pairs generated from curated data. Key metrics, including the Exact Match (EM) score and F1 score, were used for evaluation, achieving scores of 55.26\% and 74.21\%, respectively. The results demonstrate promising potential for domain-specific Bengali question-answering systems. Further refinements are needed to improve performance for more complex queries.


Cross-lingual Transfer for Automatic Question Generation by Learning Interrogative Structures in Target Languages

arXiv.org Artificial Intelligence

Automatic question generation (QG) serves a wide range of purposes, such as augmenting question-answering (QA) corpora, enhancing chatbot systems, and developing educational materials. Despite its importance, most existing datasets predominantly focus on English, resulting in a considerable gap in data availability for other languages. Cross-lingual transfer for QG (XLT-QG) addresses this limitation by allowing models trained on high-resource language datasets to generate questions in low-resource languages. In this paper, we propose a simple and efficient XLT-QG method that operates without the need for monolingual, parallel, or labeled data in the target language, utilizing a small language model. Our model, trained solely on English QA datasets, learns interrogative structures from a limited set of question exemplars, which are then applied to generate questions in the target language. Experimental results show that our method outperforms several XLT-QG baselines and achieves performance comparable to GPT-3.5-turbo across different languages. Additionally, the synthetic data generated by our model proves beneficial for training multilingual QA models. With significantly fewer parameters than large language models and without requiring additional training for target languages, our approach offers an effective solution for QG and QA tasks across various languages.


Educational Question Generation of Children Storybooks via Question Type Distribution Learning and Event-Centric Summarization

arXiv.org Artificial Intelligence

Generating educational questions of fairytales or storybooks is vital for improving children's literacy ability. However, it is challenging to generate questions that capture the interesting aspects of a fairytale story with educational meaningfulness. In this paper, we propose a novel question generation method that first learns the question type distribution of an input story paragraph, and then summarizes salient events which can be used to generate high-cognitive-demand questions. To train the event-centric summarizer, we finetune a pre-trained transformer-based sequence-to-sequence model using silver samples composed by educational question-answer pairs. On a newly proposed educational question answering dataset FairytaleQA, we show good performance of our method on both automatic and human evaluation metrics. Our work indicates the necessity of decomposing question type distribution learning and event-centric summary generation for educational question generation.


High-Order Attention Models for Visual Question Answering

Neural Information Processing Systems

The quest for algorithms that enable cognitive abilities is an important part of machine learning. A common trait in many recently investigated cognitive-like tasks is that they take into account different data modalities, such as visual and textual input. In this paper we propose a novel and generally applicable form of attention mechanism that learns high-order correlations between various data modalities. We show that high-order correlations effectively direct the appropriate attention to the relevant elements in the different data modalities that are required to solve the joint task. We demonstrate the effectiveness of our high-order attention mechanism on the task of visual question answering (VQA), where we achieve state-of-the-art performance on the standard VQA dataset.