AITopics | Question Answering

Collaborating Authors

Question Answering

"Questions are asked and answered every day. Question answering (QA) technology aims to deliver the same facility online. It goes further than the more familiar search based on keywords (as in Google, Yahoo, and other search engines), in attempting to recognize what a question expresses and to respond with an actual answer. This simplifies things for users in two ways. First, questions do not often translate into a simple list of keywords. ...Second, QA takes responsibility for providing answers, rather than a searchable list of links to potentially relevant documents (web pages), highlighted by snippets of text that show how the query matched the documents."
– from Bonnie Webber & Nick Webb. Question Answering. In The Handbook of Computational Linguistics and Natural Language Processing. Alexander Clark, Chris Fox, Shalom Lappin (Eds.). Wiley, 2010.

News Overviews Instructional Materials AI-Alerts Classics

CURE: Confidence-driven Unified Reasoning Ensemble Framework for Medical Question Answering

Elshaer, Ziad, Rashed, Essam A.

arXiv.org Artificial IntelligenceOct-17-2025

High-performing medical Large Language Models (LLMs) typically require extensive fine-tuning with substantial computational resources, limiting accessibility for resource-constrained healthcare institutions. This study introduces a confidence-driven multi-model framework that leverages model diversity to enhance medical question answering without fine-tuning. Our framework employs a two-stage architecture: a confidence detection module assesses the primary model's certainty, and an adaptive routing mechanism directs low-confidence queries to Helper models with complementary knowledge for collaborative reasoning. We evaluate our approach using Qwen3-30B-A3B-Instruct, Phi-4 14B, and Gemma 2 12B across three medical benchmarks; MedQA, MedMCQA, and PubMedQA. Result demonstrate that our framework achieves competitive performance, with particularly strong results in PubMedQA (95.0\%) and MedMCQA (78.0\%). Ablation studies confirm that confidence-aware routing combined with multi-model collaboration substantially outperforms single-model approaches and uniform reasoning strategies. This work establishes that strategic model collaboration offers a practical, computationally efficient pathway to improve medical AI systems, with significant implications for democratizing access to advanced medical AI in resource-limited settings.

large language model, machine learning, question answering, (17 more...)

arXiv.org Artificial Intelligence

2510.14353

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (0.68)
Health & Medicine > Health Care Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Discrepancy Detection at the Data Level: Toward Consistent Multilingual Question Answering

Calvo-Bartolomé, Lorena, Aldana, Valérie, Cantarero, Karla, de Mesa, Alonso Madroñal, Arenas-García, Jerónimo, Boyd-Graber, Jordan

arXiv.org Artificial IntelligenceOct-15-2025

Multilingual question answering (QA) systems must ensure factual consistency across languages, especially for objective queries such as What is jaundice?, while also accounting for cultural variation in subjective responses. We propose MIND, a user-in-the-loop fact-checking pipeline to detect factual and cultural discrepancies in multilingual QA knowledge bases. MIND highlights divergent answers to culturally sensitive questions (e.g., Who assists in childbirth?) that vary by region and context. We evaluate MIND on a bilingual QA system in the maternal and infant health domain and release a dataset of bilingual questions annotated for factual and cultural inconsistencies. We further test MIND on datasets from other domains to assess generalization. In all cases, MIND reliably identifies inconsistencies, supporting the development of more culturally aware and factually consistent QA systems.

computational linguistic, machine learning, question answering, (17 more...)

arXiv.org Artificial Intelligence

2510.11928

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (1.00)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Chronological Passage Assembling in RAG framework for Temporal Question Answering

Kim, Byeongjeong, Park, Jeonghyun, Yang, Joonho, Lee, Hwanhee

arXiv.org Artificial IntelligenceOct-14-2025

Long-context question answering over narrative tasks is challenging because correct answers often hinge on reconstructing a coherent timeline of events while preserving contextual f low in a limited context window. Retrievalaugmented generation (RAG) methods aim to address this challenge by selectively retrieving only necessary document segments. However, narrative texts possess unique characteristics that limit the effectiveness of these existing approaches. Specifically, understanding narrative texts requires more than isolated segments, as the broader context and sequential relationships between segments are crucial for comprehension. To address these limitations, we propose ChronoRAG, a novel RAG framework specialized for narrative texts. This approach focuses on two essential aspects: refining dispersed document information into coherent and structured passages and preserving narrative flow by explicitly capturing and maintaining the temporal order among retrieved passages. We empirically demonstrate the effectiveness of ChronoRAG through experiments on the NarrativeQA and GutenQAdataset, showing substantial improvements in tasks requiring both factual identification and comprehension of complex sequential relationships, underscoring that reasoning over temporal order is crucial in resolving narrative QA.

large language model, machine learning, question answering, (22 more...)

arXiv.org Artificial Intelligence

2508.18748

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

VeritasFi: An Adaptable, Multi-tiered RAG Framework for Multi-modal Financial Question Answering

Tai, Zhenghan, Wu, Hanwei, Hu, Qingchen, Chi, Jijun, He, Hailin, Ding, Lei, Kwok, Tung Sum Thomas, Xiao, Bohuai, Hua, Yuchen, Wang, Suyuchen, Lu, Peng, Li, Muzhi, Wu, Yihong, Ma, Liheng, Huang, Jerry, Zhang, Jiayi, Zhang, Gonghao, Jiang, Chaolong, Tian, Jingrui, Lyu, Sicheng, Li, Zeyu, Han, Boyu, Mo, Fengran, Yu, Xinyue, Cui, Yufei, Zhou, Ling, Wang, Xinyu

arXiv.org Artificial IntelligenceOct-14-2025

Retrieval-Augmented Generation (RAG) is becoming increasingly essential for Question Answering (QA) in the financial sector, where accurate and contextually grounded insights from complex public disclosures are crucial. However, existing financial RAG systems face two significant challenges: (1) they struggle to process heterogeneous data formats, such as text, tables, and figures; and (2) they encounter difficulties in balancing general-domain applicability with company-specific adaptation. To overcome these challenges, we present VeritasFi, an innovative hybrid RAG framework that incorporates a multi-modal preprocessing pipeline alongside a cutting-edge two-stage training strategy for its re-ranking component. VeritasFi enhances financial QA through three key innovations: (1) A multi-modal preprocessing pipeline that seamlessly transforms heterogeneous data into a coherent, machine-readable format. (2) A tripartite hybrid retrieval engine that operates in parallel, combining deep multi-path retrieval over a semantically indexed document corpus, real-time data acquisition through tool utilization, and an expert-curated memory bank for high-frequency questions, ensuring comprehensive scope, accuracy, and efficiency. (3) A two-stage training strategy for the document re-ranker, which initially constructs a general, domain-specific model using anonymized data, followed by rapid fine-tuning on company-specific data for targeted applications. By integrating our proposed designs, VeritasFi presents a groundbreaking framework that greatly enhances the adaptability and robustness of financial RAG systems, providing a scalable solution for both general-domain and company-specific QA tasks. Code accompanying this work is available at https://github.com/simplew4y/VeritasFi.git.

large language model, machine learning, question answering, (18 more...)

arXiv.org Artificial Intelligence

2510.10828

Country:

North America > United States > California (0.28)
North America > Canada > Ontario (0.28)

Genre: Research Report (0.53)

Industry:

Banking & Finance > Trading (1.00)
Law (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

EpiCache: Episodic KV Cache Management for Long Conversational Question Answering

Kim, Minsoo, Kundu, Arnav, Kim, Han-Byul, Dixit, Richa, Cho, Minsik

arXiv.org Artificial IntelligenceOct-14-2025

Modern large language models (LLMs) extend context lengths to millions of tokens, enabling coherent, personalized responses grounded in long conversational histories. This ability, however, hinges on Key-Value (KV) caching, whose memory grows linearly with dialogue length and quickly becomes the bottleneck in resource-constrained environments. An active line of research for reducing memory bottleneck is KV cache compression, which seeks to limit cache size while preserving accuracy. Yet existing methods face two major limitations: (i) evicting the KV cache after full-context prefill causes unbounded peak memory, and (ii) query-dependent eviction narrows the cache to a single query, leading to failure cases in multi-turn conversations. We introduce EpiCache, a training-free KV cache management framework for long conversational question answering (LongConvQA) under fixed memory budgets. EpiCache bounds cache growth through block-wise prefill and preserves topic-relevant context via episodic KV compression, which clusters conversation history into coherent episodes and applies episode-specific KV cache eviction. We further design an adaptive layer-wise budget allocation strategy that measures each layer's sensitivity to eviction and distributes the memory budget across layers accordingly. Across three LongConvQA benchmarks, EpiCache improves accuracy by up to 40%, maintains near-full KV accuracy under 4-6x compression, and reduces latency/memory by up to 2.4x/3.5x, enabling efficient multi-turn interaction under strict resource limits. Our code is available at https://github.com/apple/ml-epicache.

large language model, machine learning, question answering, (21 more...)

arXiv.org Artificial Intelligence

2509.17396

Country:

Asia (0.92)
Europe > Austria (0.28)
North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Supplementary Materials for MEQA: A Benchmark for Multi-hop Event-centric Question Answering with Explanations

Neural Information Processing SystemsOct-10-2025, 19:43:26 GMT

We utilize an open and widely used data format, i.e., JSON format, for the MEQA dataset. "context": "Roadside IED kills Russian major general [...]", # The context of the question "question": "Who died before AI-monitor reported it online?", "What event contains Al-Monitor is the communicator? "What event is after #1 has a victim? "Who died in the #2? major general,local commander,lieutenant general" We present a list of Datasheets [Gebru et al., 2021] for the MEQA dataset, synthesizing many of the For what purpose was the dataset created?

dataset, explanation, meqa dataset, (14 more...)

Neural Information Processing Systems

Country:

Europe > Ukraine > Kyiv Oblast > Kyiv (0.06)
North America > United States > Texas (0.05)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.42)

Add feedback

MEQA: A Benchmark for Multi-hop Event-centric Question Answering with Explanations

Neural Information Processing SystemsOct-10-2025, 19:43:24 GMT

However, there's a lack of insight into how these models perform in terms

computational linguistic, explanation, relation, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ukraine > Kyiv Oblast > Kyiv (0.05)
Asia > Middle East > Syria (0.04)
(18 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Government (1.00)
Law (0.93)
Leisure & Entertainment > Sports > Basketball (0.67)
Law Enforcement & Public Safety (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.84)
(2 more...)

Add feedback

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers

Neural Information Processing SystemsOct-10-2025, 18:04:23 GMT

Seeking answers to questions within long scientific research articles is a crucial area of study that aids readers in quickly addressing their inquiries.

dataset, figure and table, l3score, (16 more...)

Neural Information Processing Systems

Country: