AITopics | Question Answering

Collaborating Authors

Question Answering

"Questions are asked and answered every day. Question answering (QA) technology aims to deliver the same facility online. It goes further than the more familiar search based on keywords (as in Google, Yahoo, and other search engines), in attempting to recognize what a question expresses and to respond with an actual answer. This simplifies things for users in two ways. First, questions do not often translate into a simple list of keywords. ...Second, QA takes responsibility for providing answers, rather than a searchable list of links to potentially relevant documents (web pages), highlighted by snippets of text that show how the query matched the documents."
– from Bonnie Webber & Nick Webb. Question Answering. In The Handbook of Computational Linguistics and Natural Language Processing. Alexander Clark, Chris Fox, Shalom Lappin (Eds.). Wiley, 2010.

News Overviews Instructional Materials AI-Alerts Classics

LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts

Le, Thanh-Phong, Phan, Trung Le Chi, Nguyen, Nghia Hieu, Van Nguyen, Kiet

arXiv.org Artificial IntelligenceMar-7-2025

Document Visual Question Answering (Document VQA) challenges multimodal systems to holistically handle textual, layout, and visual modalities to provide appropriate answers. Document VQA has gained popularity in recent years due to the increasing amount of documents and the high demand for digitization. Nonetheless, most of document VQA datasets are developed in high-resource languages such as English. In this paper, we present ReceiptVQA (\textbf{Receipt} \textbf{V}isual \textbf{Q}uestion \textbf{A}nswering), the initial large-scale document VQA dataset in Vietnamese dedicated to receipts, a document kind with high commercial potentials. The dataset encompasses \textbf{9,000+} receipt images and \textbf{60,000+} manually annotated question-answer pairs. In addition to our study, we introduce LiGT (\textbf{L}ayout-\textbf{i}nfused \textbf{G}enerative \textbf{T}ransformer), a layout-aware encoder-decoder architecture designed to leverage embedding layers of language models to operate layout embeddings, minimizing the use of additional neural modules. Experiments on ReceiptVQA show that our architecture yielded promising performance, achieving competitive results compared with outstanding baselines. Furthermore, throughout analyzing experimental results, we found evident patterns that employing encoder-only model architectures has considerable disadvantages in comparison to architectures that can generate answers. We also observed that it is necessary to combine multiple modalities to tackle our dataset, despite the critical role of semantic understanding from language models. We hope that our work will encourage and facilitate future development in Vietnamese document VQA, contributing to a diverse multimodal research community in the Vietnamese language.

architecture, dataset, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2502.19202

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States > New York > New York County > New York City (0.04)
(7 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.46)
(2 more...)

Add feedback

Evaluating Answer Reranking Strategies in Time-sensitive Question Answering

Kardan, Mehmet, Piryani, Bhawna, Jatowt, Adam

arXiv.org Artificial IntelligenceMar-6-2025

Despite advancements in state-of-the-art models and information retrieval techniques, current systems still struggle to handle temporal information and to correctly answer detailed questions about past events. In this paper, we investigate the impact of temporal characteristics of answers in Question Answering (QA) by exploring several simple answer selection techniques. Our findings emphasize the role of temporal features in selecting the most relevant answers from diachronic document collections and highlight differences between explicit and implicit temporal questions.

dataset, information, publication date, (14 more...)

arXiv.org Artificial Intelligence

2503.04972

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.15)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.88)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.63)

Add feedback

Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations

Nguyen, Khoi Anh, Vu, Linh Yen, Duong, Thang Dinh, Duong, Thuan Nguyen, Nguyen, Huy Thanh, Dinh, Vinh Quang

arXiv.org Artificial IntelligenceMar-6-2025

Visual Question Answering (VQA) is a multimodal task requiring reasoning across textual and visual inputs, which becomes particularly challenging in low-resource languages like Vietnamese due to linguistic variability and the lack of high-quality datasets. Traditional methods often rely heavily on extensive annotated datasets, computationally expensive pipelines, and large pre-trained models, specifically in the domain of Vietnamese VQA, limiting their applicability in such scenarios. To address these limitations, we propose a training framework that combines a paraphrase-based feature augmentation module with a dynamic curriculum learning strategy. Explicitly, augmented samples are considered "easy" while raw samples are regarded as "hard". The framework then utilizes a mechanism that dynamically adjusts the ratio of easy to hard samples during training, progressively modifying the same dataset to increase its difficulty level. By enabling gradual adaptation to task complexity, this approach helps the Vietnamese VQA model generalize well, thus improving overall performance. Experimental results show consistent improvements on the OpenViVQA dataset and mixed outcomes on the ViVQA dataset, highlighting both the potential and challenges of our approach in advancing VQA for Vietnamese language.

dataset, encoder, thresh, (14 more...)

arXiv.org Artificial Intelligence

2503.03285

Country:

Europe > France > Bourgogne-Franche-Comté > Côte-d'Or > Dijon (0.04)
Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.35)

Add feedback

Cross-modal Causal Relation Alignment for Video Question Grounding

Chen, Weixing, Liu, Yang, Chen, Binglin, Su, Jiandong, Zheng, Yongsen, Lin, Liang

arXiv.org Artificial IntelligenceMar-4-2025

Video question grounding (VideoQG) requires models to answer the questions and simultaneously infer the relevant video segments to support the answers. However, existing VideoQG methods usually suffer from spurious cross-modal correlations, leading to a failure to identify the dominant visual scenes that align with the intended question. Moreover, vision-language models exhibit unfaithful generalization performance and lack robustness on challenging downstream tasks such as VideoQG. In this work, we propose a novel VideoQG framework named Cross-modal Causal Relation Alignment (CRA), to eliminate spurious correlations and improve the causal consistency between question-answering and video temporal grounding. Our CRA involves three essential components: i) Gaussian Smoothing Grounding (GSG) module for estimating the time interval via cross-modal attention, which is de-noised by an adaptive Gaussian filter, ii) Cross-Modal Alignment (CMA) enhances the performance of weakly supervised VideoQG by leveraging bidirectional contrastive learning between estimated video segments and QA features, iii) Explicit Causal Intervention (ECI) module for multimodal deconfounding, which involves front-door intervention for vision and back-door intervention for language. Extensive experiments on two VideoQG datasets demonstrate the superiority of our CRA in discovering visually grounded content and achieving robust question reasoning. Codes are available at https://github.com/WissingChen/CRA-GQA.

dataset, intervention, video, (15 more...)

arXiv.org Artificial Intelligence

2503.07635

Country:

Asia > Singapore (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.36)

Add feedback

Structural Deep Encoding for Table Question Answering

Mouravieff, Raphaël, Piwowarski, Benjamin, Lamprier, Sylvain

arXiv.org Artificial IntelligenceMar-3-2025

Although Transformers-based architectures excel at processing textual information, their naive adaptation for tabular data often involves flattening the table structure. This simplification can lead to the loss of essential inter-dependencies between rows, columns, and cells, while also posing scalability challenges for large tables. To address these issues, prior works have explored special tokens, structured embeddings, and sparse attention patterns. In this paper, we conduct a comprehensive analysis of tabular encoding techniques, which highlights the crucial role of attention sparsity in preserving structural information of tables. We also introduce a set of novel sparse attention mask designs for tabular data, that not only enhance computational efficiency but also preserve structural integrity, leading to better overall performance.

arxiv preprint arxiv, dataset, generalization, (14 more...)

arXiv.org Artificial Intelligence

2503.01457

Country:

North America > Mexico > Mexico City > Mexico City (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Q-NL Verifier: Leveraging Synthetic Data for Robust Knowledge Graph Question Answering

Schwabe, Tim, Siebel, Louisa, Valach, Patrik, Acosta, Maribel

arXiv.org Artificial IntelligenceMar-3-2025

Question answering (QA) requires accurately aligning user questions with structured queries, a process often limited by the scarcity of high-quality query-natural language (Q-NL) pairs. To overcome this, we present Q-NL Verifier, an approach to generating high-quality synthetic pairs of queries and NL translations. Our approach relies on large language models (LLMs) to generate semantically precise natural language paraphrases of structured queries. Building on these synthetic Q-NL pairs, we introduce a learned verifier component that automatically determines whether a generated paraphrase is semantically equivalent to the original query. Our experiments with the well-known LC-QuAD 2.0 benchmark show that Q-NL Verifier generalizes well to paraphrases from other models and even human-authored translations. Our approach strongly aligns with human judgments across varying query complexities and outperforms existing NLP metrics in assessing semantic correctness. We also integrate the verifier into QA pipelines, showing that verifier-filtered synthetic data has significantly higher quality in terms of translation correctness and enhances NL to Q translation accuracy. Lastly, we release an updated version of the LC-QuAD 2.0 benchmark containing our synthetic Q-NL pairs and verifier scores, offering a new resource for robust and scalable QA.

dataset, query, translation, (15 more...)

arXiv.org Artificial Intelligence

2503.01385

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
(13 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
(2 more...)

Add feedback

SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking

Nguyen, Nam V., Tran, Dien X., Tran, Thanh T., Hoang, Anh T., Duong, Tai V., Le, Di T., Le, Phuc-Lu

arXiv.org Artificial IntelligenceMar-2-2025

The rise of misinformation, exacerbated by Large Language Models (LLMs) like GPT and Gemini, demands robust fact-checking solutions, especially for low-resource languages like Vietnamese. Existing methods struggle with semantic ambiguity, homonyms, and complex linguistic structures, often trading accuracy for efficiency. We introduce SemViQA, a novel Vietnamese fact-checking framework integrating Semantic-based Evidence Retrieval (SER) and Two-step Verdict Classification (TVC). Our approach balances precision and speed, achieving state-of-the-art results with 78.97\% strict accuracy on ISE-DSC01 and 80.82\% on ViWikiFC, securing 1st place in the UIT Data Science Challenge. Additionally, SemViQA Faster improves inference speed 7x while maintaining competitive accuracy. SemViQA sets a new benchmark for Vietnamese fact verification, advancing the fight against misinformation. The source code is available at: https://github.com/DAVID-NGUYEN-S16/SemViQA.

computational linguistic, wang, zhang, (14 more...)

arXiv.org Artificial Intelligence

2503.00955

Country:

Asia > North Korea (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
(14 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Media > News (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.87)

Add feedback

Exploring Rewriting Approaches for Different Conversational Tasks

Tanjim, Md Mehrab, Rossi, Ryan A., Rimer, Mike, Chen, Xiang, Kim, Sungchul, Muppala, Vaishnavi, Yu, Tong, Hu, Zhengmian, Sinha, Ritwik, Zhang, Wei, Burhanuddin, Iftikhar Ahamath, Dernoncourt, Franck

arXiv.org Artificial IntelligenceFeb-27-2025

Conversational assistants often require a question rewriting algorithm that leverages a subset of past interactions to provide a more meaningful (accurate) answer to the user's question or request. However, the exact rewriting approach may often depend on the use case and application-specific tasks supported by the conversational assistant, among other constraints. In this paper, we systematically investigate two different approaches, denoted as rewriting and fusion, on two fundamentally different generation tasks, including a text-to-text generation task and a multimodal generative task that takes as input text and generates a visualization or data table that answers the user's question. Our results indicate that the specific rewriting or fusion approach highly depends on the underlying use case and generative task. In particular, we find that for a conversational question-answering assistant, the query rewriting approach performs best, whereas for a data analysis assistant that generates visualizations and data tables based on the user's conversation with the assistant, the fusion approach works best. Notably, we explore two datasets for the data analysis assistant use case, for short and long conversations, and we find that query fusion always performs better, whereas for the conversational text-based question-answering, the query rewrite approach performs best.

dataset, query, query fusion, (14 more...)

arXiv.org Artificial Intelligence

2502.1886

Country:

North America > United States (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.56)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.55)

Add feedback

Time-MQA: Time Series Multi-Task Question Answering with Context Enhancement

Kong, Yaxuan, Yang, Yiyuan, Hwang, Yoontae, Du, Wenjie, Zohren, Stefan, Wang, Zhangyang, Jin, Ming, Wen, Qingsong

arXiv.org Artificial IntelligenceFeb-26-2025

Time series data are foundational in finance, healthcare, and energy domains. However, most existing methods and datasets remain focused on a narrow spectrum of tasks, such as forecasting or anomaly detection. To bridge this gap, we introduce Time Series Multi-Task Question Answering (Time-MQA), a unified framework that enables natural language queries across multiple time series tasks - numerical analytical tasks and open-ended question answering with reasoning. Central to Time-MQA is the TSQA dataset, a large-scale dataset containing $\sim$200k question-answer pairs derived from diverse time series spanning environment, traffic, etc. This comprehensive resource covers various time series lengths and promotes robust model development. We further demonstrate how continually pre-training large language models (Mistral 7B, Llama-3 8B, and Qwen-2.5 7B) on the TSQA dataset enhanced time series reasoning capabilities, moving beyond mere numeric tasks and enabling more advanced and intuitive interactions with temporal data. The complete TSQA dataset, models, executable codes, user study questionnaires for evaluation, and results have all been open-sourced.

arxiv preprint arxiv, dataset, forecasting, (14 more...)

arXiv.org Artificial Intelligence

2503.01875

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Uncertainty Quantification in Retrieval Augmented Question Answering

Perez-Beltrachini, Laura, Lapata, Mirella

arXiv.org Artificial IntelligenceFeb-25-2025

Retrieval augmented Question Answering (QA) helps QA models overcome knowledge gaps by incorporating retrieved evidence, typically a set of passages, alongside the question at test time. Previous studies show that this approach improves QA performance and reduces hallucinations, without, however, assessing whether the retrieved passages are indeed useful at answering correctly. In this work, we propose to quantify the uncertainty of a QA model via estimating the utility of the passages it is provided with. We train a lightweight neural model to predict passage utility for a target QA model and show that while simple information theoretic metrics can predict answer correctness up to a certain extent, our approach efficiently approximates or outperforms more expensive sampling-based methods. Code and data are available at https://github.com/lauhaide/ragu.

computational linguistic, proceedings, qa model, (15 more...)

arXiv.org Artificial Intelligence

2502.18108

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Iran (0.04)
North America > United States > New York (0.04)
(19 more...)

Genre: Research Report (0.50)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Law (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback