AITopics | Question Answering

Collaborating Authors

Question Answering

"Questions are asked and answered every day. Question answering (QA) technology aims to deliver the same facility online. It goes further than the more familiar search based on keywords (as in Google, Yahoo, and other search engines), in attempting to recognize what a question expresses and to respond with an actual answer. This simplifies things for users in two ways. First, questions do not often translate into a simple list of keywords. ...Second, QA takes responsibility for providing answers, rather than a searchable list of links to potentially relevant documents (web pages), highlighted by snippets of text that show how the query matched the documents."
– from Bonnie Webber & Nick Webb. Question Answering. In The Handbook of Computational Linguistics and Natural Language Processing. Alexander Clark, Chris Fox, Shalom Lappin (Eds.). Wiley, 2010.

News Overviews Instructional Materials AI-Alerts Classics

QuAnTS: Question Answering on Time Series

Divo, Felix, Kraus, Maurice, Nguyen, Anh Q., Xue, Hao, Razzak, Imran, Salim, Flora D., Kersting, Kristian, Dhami, Devendra Singh

arXiv.org Artificial IntelligenceNov-10-2025

Text offers intuitive access to information. This can, in particular, complement the density of numerical time series, thereby allowing improved interactions with time series models to enhance accessibility and decision-making. While the creation of question-answering datasets and models has recently seen remarkable growth, most research focuses on question answering (QA) on vision and text, with time series receiving minute attention. To bridge this gap, we propose a challenging novel time series QA (TSQA) dataset, QuAnTS, for Question Answering on Time Series data. Specifically, we pose a wide variety of questions and answers about human motion in the form of tracked skeleton trajectories. We verify that the large-scale QuAnTS dataset is well-formed and comprehensive through extensive experiments. Thoroughly evaluating existing and newly proposed baselines then lays the groundwork for a deeper exploration of TSQA using QuAnTS. Additionally, we provide human performances as a key reference for gauging the practical usability of such models. We hope to encourage future research on interacting with time series models through text, enabling better decision-making and more transparent systems.

large language model, machine learning, question answering, (22 more...)

arXiv.org Artificial Intelligence

2511.05124

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Research Report (1.00)
Workflow (0.68)
Overview (0.67)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)

Add feedback

MOTOR: Multimodal Optimal Transport via Grounded Retrieval in Medical Visual Question Answering

Shaaban, Mai A., Saleem, Tausifa Jan, Papineni, Vijay Ram, Yaqub, Mohammad

arXiv.org Artificial IntelligenceNov-10-2025

Medical visual question answering (MedVQA) plays a vital role in clinical decision-making by providing contextually rich answers to image-based queries. Although vision-language models (VLMs) are widely used for this task, they often generate factually incorrect answers. Retrieval-augmented generation addresses this challenge by providing information from external sources, but risks retrieving irrelevant context, which can degrade the reasoning capabilities of VLMs. Re-ranking retrievals, as introduced in existing approaches, enhances retrieval relevance by focusing on query-text alignment. However, these approaches neglect the visual or multimodal context, which is particularly crucial for medical diagnosis. We propose MOTOR, a novel multimodal retrieval and re-ranking approach that leverages grounded captions and optimal transport. It captures the underlying relationships between the query and the retrieved context based on textual and visual information. Consequently, our approach identifies more clinically relevant contexts to augment the VLM input. Empirical analysis and human expert evaluation demonstrate that MOTOR achieves higher accuracy on MedVQA datasets, outperforming state-of-the-art methods by an average of 6.45%. Code is available at https://github.com/BioMedIA-MBZUAI/MOTOR.

large language model, machine learning, question answering, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-032-04978-0_44

2506.229

Country:

North America > United States (0.69)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.15)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.73)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Med-GLIP: Advancing Medical Language-Image Pre-training with Large-scale Grounded Dataset

Deng, Ziye, He, Ruihan, Liu, Jiaxiang, Wang, Yuan, Meng, Zijie, Jiang, Songtao, Xie, Yong, Liu, Zuozhu

arXiv.org Artificial IntelligenceNov-7-2025

Medical image grounding aims to align natural language phrases with specific regions in medical images, serving as a foundational task for intelligent diagnosis, visual question answering (VQA), and automated report generation (MRG). However, existing research is constrained by limited modality coverage, coarse-grained annotations, and the absence of a unified, generalizable grounding framework. To address these challenges, we construct a large-scale medical grounding dataset Med-GLIP-5M comprising over 5.3 million region-level annotations across seven imaging modalities, covering diverse anatomical structures and pathological findings. The dataset supports both segmentation and grounding tasks with hierarchical region labels, ranging from organ-level boundaries to fine-grained lesions. Based on this foundation, we propose Med-GLIP, a modality-aware grounding framework trained on Med-GLIP-5M. Rather than relying on explicitly designed expert modules, Med-GLIP implicitly acquires hierarchical semantic understanding from diverse training data -- enabling it to recognize multi-granularity structures, such as distinguishing lungs from pneumonia lesions. Extensive experiments demonstrate that Med-GLIP consistently outperforms state-of-the-art baselines across multiple grounding benchmarks. Furthermore, integrating its spatial outputs into downstream tasks, including medical VQA and report generation, leads to substantial performance gains. Our dataset will be released soon.

large language model, machine learning, question answering, (18 more...)

arXiv.org Artificial Intelligence

2508.10528

Country: Asia > China > Zhejiang Province (0.15)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Nuclear Medicine (0.94)
Health & Medicine > Therapeutic Area (0.91)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.49)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

Multi-Task Learning for Visually Grounded Reasoning in Gastrointestinal VQA

Safwan, Itbaan, Shaikh, Muhammad Annas, Haaris, Muhammad, Khan, Ramail, Tahir, Muhammad Atif

arXiv.org Artificial IntelligenceNov-7-2025

We present a multi-task framework for the MediaEval Medico 2025 challenge, leveraging a LoRA-tuned Florence-2 model for simultaneous visual question answering (VQA), explanation generation, and visual grounding. The proposed system integrates three curated datasets: (1) Kvasir-VQA-x1 for question-answer learning, (2) a synthetically enriched explanation dataset offering structured medical reasoning, and (3) text-to-region pairs linking visual features with segmentation masks. This multi-task setup enables the model to jointly learn visual grounding, reasoning, and interpretation, producing responses that are both accurate and interpretable. Extensive evaluation demonstrates that our approach substantially improves over single-task baselines in both answer accuracy and visual localization, highlighting the effectiveness of grounded multi-task learning for medical VQA applications.

explanation, machine learning, question answering, (14 more...)

arXiv.org Artificial Intelligence

2511.04384

Country:

North America > United States (0.14)
Asia > Pakistan (0.14)

Genre: Research Report (0.42)

Industry: Health & Medicine > Diagnostic Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.37)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.35)

Add feedback

Hierarchical Retrieval with Evidence Curation for Open-Domain Financial Question Answering on Standardized Documents

Choe, Jaeyoung, Kim, Jihoon, Jung, Woohwan

arXiv.org Artificial IntelligenceNov-7-2025

Retrieval-augmented generation (RAG) based large language models (LLMs) are widely used in finance for their excellent performance on knowledge-intensive tasks. However, standardized documents (e.g., SEC filing) share similar formats such as repetitive boilerplate texts, and similar table structures. This similarity forces traditional RAG methods to misidentify near-duplicate text, leading to duplicate retrieval that undermines accuracy and completeness. To address these issues, we propose the Hierarchical Retrieval with Evidence Curation (HiREC) framework. Our approach first performs hierarchical retrieval to reduce confusion among similar texts. It first retrieve related documents and then selects the most relevant passages from the documents. The evidence curation process removes irrelevant passages. When necessary, it automatically generates complementary queries to collect missing information. To evaluate our approach, we construct and release a Large-scale Open-domain Financial (LOFin) question answering benchmark that includes 145,897 SEC documents and 1,595 question-answer pairs. Our code and data are available at https://github.com/deep-over/LOFin-bench-HiREC.

large language model, machine learning, question answering, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2025.findings-acl.855

2505.20368

Country: North America > United States (0.88)

Genre: Research Report > New Finding (1.00)

Industry:

Banking & Finance > Trading (0.48)
Law > Business Law (0.35)
Government > Regional Government > North America Government > United States Government (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ChiMDQA: Towards Comprehensive Chinese Document QA with Fine-grained Evaluation

Gao, Jing, Luo, Shutiao, Liu, Yumeng, Li, Yuanming, Zeng, Hongji

arXiv.org Artificial IntelligenceNov-6-2025

With the rapid advancement of natural language processing (NLP) technologies, the demand for high-quality Chinese document question-answering datasets is steadily growing. To address this issue, we present the Chinese Multi-Document Question Answering Dataset(ChiMDQA), specifically designed for downstream business scenarios across prevalent domains including academic, education, finance, law, medical treatment, and news. ChiMDQA encompasses long-form documents from six distinct fields, consisting of 6,068 rigorously curated, high-quality question-answer (QA) pairs further classified into ten fine-grained categories. Through meticulous document screening and a systematic question-design methodology, the dataset guarantees both diversity and high quality, rendering it applicable to various NLP tasks such as document comprehension, knowledge extraction, and intelligent QA systems. Additionally, this paper offers a comprehensive overview of the dataset's design objectives, construction methodologies, and fine-grained evaluation system, supplying a substantial foundation for future research and practical applications in Chinese QA. The code and data are available at: https://anonymous.4open.science/r/Foxit-CHiMDQA/.

large language model, machine learning, question answering, (20 more...)

arXiv.org Artificial Intelligence

2511.03656

Country:

North America > United States (0.68)
Asia > China (0.48)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine (0.88)
Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Text-VQA Aug: Pipelined Harnessing of Large Multimodal Models for Automated Synthesis

Joshi, Soham, Mishra, Shwet Kamal, Gopalakrishnan, Viswanath

arXiv.org Artificial IntelligenceNov-5-2025

Creation of large-scale databases for Visual Question Answering tasks pertaining to the text data in a scene (text-VQA) involves skilful human annotation, which is tedious and challenging. With the advent of foundation models that handle vision and language modalities, and with the maturity of OCR systems, it is the need of the hour to establish an end-to-end pipeline that can synthesize Question-Answer (QA) pairs based on scene-text from a given image. We propose a pipeline for automated synthesis for text-VQA dataset that can produce faithful QA pairs, and which scales up with the availability of scene text data. Our proposed method harnesses the capabilities of multiple models and algorithms involving OCR detection and recognition (text spotting), region of interest (ROI) detection, caption generation, and question generation. These components are streamlined into a cohesive pipeline to automate the synthesis and validation of QA pairs. To the best of our knowledge, this is the first pipeline proposed to automatically synthesize and validate a large-scale text-VQA dataset comprising around 72K QA pairs based on around 44K images.

large language model, machine learning, question answering, (20 more...)

arXiv.org Artificial Intelligence

2511.02046

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

HCT-QA: A Benchmark for Question Answering on Human-Centric Tables

Ahmad, Mohammad S., Naeem, Zan A., Aupetit, Michaël, Elmagarmid, Ahmed, Eltabakh, Mohamed, Ma, Xiasong, Ouzzani, Mourad, Ruan, Chaoyi

arXiv.org Artificial IntelligenceNov-4-2025

Tabular data embedded within PDF files, web pages, and other document formats are prevalent across numerous sectors such as government, engineering, science, and business. These human-centric tables (HCTs) possess a unique combination of high business value, intricate layouts, limited operational power at scale, and sometimes serve as the only data source for critical insights. However, their complexity poses significant challenges to traditional data extraction, processing, and querying methods. While current solutions focus on transforming these tables into relational formats for SQL queries, they fall short in handling the diverse and complex layouts of HCTs and hence being amenable to querying. This paper describes HCT-QA, an extensive benchmark of HCTs, natural language queries, and related answers on thousands of tables. Our dataset includes 2,188 real-world HCTs with 9,835 QA pairs and 4,679 synthetic tables with 67.5K QA pairs. While HCTs can be potentially processed by different type of query engines, in this paper, we focus on Large Language Models as potential engines and assess their ability in processing and querying such tables.

large language model, machine learning, question answering, (22 more...)

arXiv.org Artificial Intelligence

2504.20047

Country:

North America > United States (0.46)
Asia > Pakistan (0.28)
North America > Canada (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry:

Information Technology (0.92)
Government > Regional Government (0.46)
Transportation > Passenger (0.46)
Transportation > Air (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PolyG: Adaptive Graph Traversal for Diverse GraphRAG Questions

Liu, Renjie, Jiang, Haitian, Yan, Xiao, Tang, Bo, Li, Jinyang

arXiv.org Artificial IntelligenceNov-4-2025

GraphRAG enhances large language models (LLMs) to generate quality answers for user questions by retrieving related facts from external knowledge graphs. However, current GraphRAG methods are primarily evaluated on and overly tailored for knowledge graph question answering (KGQA) benchmarks, which are biased towards a few specific question patterns and do not reflect the diversity of real-world questions. To better evaluate GraphRAG methods, we propose a complete four-class taxonomy to categorize the basic patterns of knowledge graph questions and use it to create PolyBench, a new GraphRAG benchmark encompassing a comprehensive set of graph questions. With the new benchmark, we find that existing GraphRAG methods fall short in effectiveness (i.e., quality of the generated answers) and/or efficiency (i.e., response time or token usage) because they adopt either a fixed graph traversal strategy or free-form exploration by LLMs for fact retrieval. However, different question patterns require distinct graph traversal strategies and context formation. To facilitate better retrieval, we propose PolyG, an adaptive GraphRAG approach by decomposing and categorizing the questions according to our proposed question taxonomy. Built on top of a unified interface and execution engine, PolyG dynamically prompts an LLM to generate a graph database query to retrieve the context for each decomposed basic question. Compared with SOTA GraphRAG methods, PolyG achieves a higher win rate in generation quality and has a low response latency and token cost. Our code and benchmark are open-source at https://github.com/Liu-rj/PolyG.

large language model, machine learning, question answering, (21 more...)

arXiv.org Artificial Intelligence

2504.02112

Country:

North America > United States (1.00)
Europe (0.93)

Genre:

Research Report (0.82)
Workflow (0.68)
Overview (0.67)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

CMI-MTL: Cross-Mamba interaction based multi-task learning for medical visual question answering

Jin, Qiangguo, Zheng, Xianyao, Cui, Hui, Sun, Changming, Fang, Yuqi, Cong, Cong, Su, Ran, Wei, Leyi, Xuan, Ping, Wang, Junbo

arXiv.org Artificial IntelligenceNov-4-2025

Medical visual question answering (Med-VQA) is a crucial multimodal task in clinical decision support and telemedicine. Recent self-attention based methods struggle to effectively handle cross-modal semantic alignments between vision and language. Moreover, classification-based methods rely on predefined answer sets. Treating this task as a simple classification problem may make it unable to adapt to the diversity of free-form answers and overlook the detailed semantic information of free-form answers. In order to tackle these challenges, we introduce a Cross-Mamba Interaction based Multi-Task Learning (CMI-MTL) framework that learns cross-modal feature representations from images and texts. CMI-MTL comprises three key modules: fine-grained visual-text feature alignment (FVTA), cross-modal interleaved feature representation (CIFR), and free-form answer-enhanced multi-task learning (FFAE). FVTA extracts the most relevant regions in image-text pairs through fine-grained visual-text feature alignment. CIFR captures cross-modal sequential interactions via cross-modal interleaved feature representation. FFAE leverages auxiliary knowledge from open-ended questions through free-form answer-enhanced multi-task learning, improving the model's capability for open-ended Med-VQA. Experimental results show that CMI-MTL outperforms the existing state-of-the-art methods on three Med-VQA datasets: VQA-RAD, SLAKE, and OVQA. Furthermore, we conduct more interpretability experiments to prove the effectiveness. The code is publicly available at https://github.com/BioMedIA-repo/CMI-MTL.

machine learning, question answering, visual question, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.2312/pg.20251303

2511.01357

Country: Asia > China (0.69)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.74)

Add feedback