AITopics | Question Answering

Collaborating Authors

Question Answering

"Questions are asked and answered every day. Question answering (QA) technology aims to deliver the same facility online. It goes further than the more familiar search based on keywords (as in Google, Yahoo, and other search engines), in attempting to recognize what a question expresses and to respond with an actual answer. This simplifies things for users in two ways. First, questions do not often translate into a simple list of keywords. ...Second, QA takes responsibility for providing answers, rather than a searchable list of links to potentially relevant documents (web pages), highlighted by snippets of text that show how the query matched the documents."
– from Bonnie Webber & Nick Webb. Question Answering. In The Handbook of Computational Linguistics and Natural Language Processing. Alexander Clark, Chris Fox, Shalom Lappin (Eds.). Wiley, 2010.

News Overviews Instructional Materials AI-Alerts Classics

Exploring Models and Data for Image Question Answering

Mengye Ren, Ryan Kiros, Richard Zemel

Neural Information Processing SystemsOct-2-2025, 10:03:51 GMT

This work aims to address the problem of image-based question-answering (QA) with new models and datasets. In our work, we propose to use neural networks and visual semantic embeddings, without intermediate stages such as object detection and image segmentation, to predict answers to simple questions about images. Our model performs 1.8 times better than the only published results on an existing image QA dataset. We also present a question generation algorithm that converts image descriptions, which are widely available, into QA form. We used this algorithm to produce an order-of-magnitude larger dataset, with more evenly distributed answers. A suite of baseline results on this new dataset are also presented.

algorithm, dataset, ground truth, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.15)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
(2 more...)

Add feedback

Attribution Gradients: Incrementally Unfolding Citations for Critical Examination of Attributed AI Answers

Kambhamettu, Hita, Hwang, Alyssa, Laban, Philippe, Head, Andrew

arXiv.org Artificial IntelligenceOct-2-2025

AI question answering systems increasingly generate responses with attributions to sources. However, the task of verifying the actual content of these attributions is in most cases impractical. In this paper, we present attribution gradients as a solution. Attribution gradients provide integrated, incremental affordances for diving into an attributed passage. A user can decompose a sentence of an answer into its claims. For each claim, the user can view supporting and contradictory excerpts mined from sources. Those excerpts serve as clickable conduits into the source (in our application, scientific papers). When evidence itself contains more citations, the UI unpacks the evidence into excerpts from the cited sources. These features of attribution gradients facilitate concurrent interconnections among answer, claim, excerpt, and context. In a usability study, we observed greater engagement with sources and richer revision in a task where participants revised an attributed AI answer with attribution gradients and a baseline.

attribution gradient, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.00361

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(3 more...)

Add feedback

A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI

Vepa, Arvind Murari, Yu, Yannan, Gan, Jingru, Cuturrufo, Anthony, Li, Weikai, Wang, Wei, Scalzo, Fabien, Sun, Yizhou

arXiv.org Artificial IntelligenceOct-2-2025

We introduce mpLLM, a prompt-conditioned hierarchical mixture-of-experts (MoE) architecture for visual question answering over multi-parametric 3D brain MRI (mpMRI). mpLLM routes across modality-level and token-level projection experts to fuse multiple interrelated 3D modalities, enabling efficient training without image-report pretraining. To address limited image-text paired supervision, mpLLM integrates a synthetic visual question answering (VQA) protocol that generates medically relevant VQA from segmentation annotations, and we collaborate with medical experts for clinical validation. mpLLM outperforms strong medical VLM baselines by 5.3% on average across multiple mpMRI datasets. Our study features three main contributions: (1) the first clinically validated VQA dataset for 3D brain mpMRI, (2) a novel multimodal LLM that handles multiple interrelated 3D modalities, and (3) strong empirical results that demonstrate the medical utility of our methodology. Ablations highlight the importance of modality-level and token-level experts and prompt-conditioned routing.

large language model, machine learning, question answering, (20 more...)

arXiv.org Artificial Intelligence

2509.25889

Country: Europe (0.67)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Automated Evaluation can Distinguish the Good and Bad AI Responses to Patient Questions about Hospitalization

Soni, Sarvesh, Demner-Fushman, Dina

arXiv.org Artificial IntelligenceOct-2-2025

Automated approaches to answer patient-posed health questions are rising, but selecting among systems requires reliable evaluation. The current gold standard for evaluating the free-text artificial intelligence (AI) responses--human expert review--is labor-intensive and slow, limiting scalability. Automated metrics are promising yet variably aligned with human judgments and often context-dependent. To address the feasibility of automating the evaluation of AI responses to hospitalization-related questions posed by patients, we conducted a large systematic study of evaluation approaches. Across 100 patient cases, we collected responses from 28 AI systems (2800 total) and assessed them along three dimensions: whether a system response (1) answers the question, (2) appropriately uses clinical note evidence, and (3) uses general medical knowledge. Using clinician-authored reference answers to anchor metrics, automated rankings closely matched expert ratings. Our findings suggest that carefully designed automated evaluation can scale comparative assessment of AI systems and support patient-clinician communication.

large language model, machine learning, question answering, (21 more...)

arXiv.org Artificial Intelligence

2510.00436

Country:

North America > United States (1.00)
Asia (0.68)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Health & Medicine > Health Care Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.47)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Q-Mirror: Unlocking the Multi-Modal Potential of Scientific Text-Only QA Pairs

Wang, Junying, Zhang, Zicheng, Shen, Ye, Wu, Yalun, Liang, Yingji, Guo, Yijin, Wen, Farong, Li, Wenzhe, Zhao, Xuezhi, Jia, Qi, Zhai, Guangtao

arXiv.org Artificial IntelligenceOct-1-2025

High-quality, multi-modal benchmarks are crucial for advancing scientific reasoning in large models yet their manual creation is costly and unscalable. To address this bottleneck, we explore the potential for transforming Text-Only QA Pairs (TQAs) into high-quality Multi-Modal QA Pairs (MMQAs), which include three parts: 1) Task Definition \& Evaluation Rubric: We develop a TQA-to-MMQA framework and establish a comprehensive, multi-dimensional MMQA quality rubric that provides principles for the transformation. 2) Benchmark Construction: Then we construct two extensive benchmarks to rigorously evaluate state-of-the-art generation \& understanding models on the distinct tasks of MMQA generation \& MMQA quality evaluation. 3) Preliminary Solution: We develop an agentic system (Q-Mirror), which operationalizes our framework by integrating MMQA generation and evaluation into a closed loop for iterative refinement. Our experiments show that while state-of-the-art models can generate MMQAs, their outputs still leave substantial gaps, underscoring the need for reliable evaluation. We further demonstrate that top-tier understanding models align closely with human judgment in MMQA quality assessment. Leveraging both insights, the Q-Mirror agent raises average scores from 78.90 to 85.22 and pass rates from 72\% to 95\%, offering a practical path to large-scale scientific benchmarks.

large language model, machine learning, question answering, (21 more...)

arXiv.org Artificial Intelligence

2509.24297

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

Beyond Isolated Facts: Synthesizing Narrative and Grounded Supervision for VideoQA

Liang, Jianxin, Yue, Tan, Wang, Yuxuan, Wang, Yueqian, Yin, Zhihan, Zhang, Huishuai, Zhao, Dongyan

arXiv.org Artificial IntelligenceSep-30-2025

The performance of Video Question Answering (VideoQA) models is fundamentally constrained by the nature of their supervision, which typically consists of isolated, factual question-answer pairs. This "bag-of-facts" approach fails to capture the underlying narrative and causal structure of events, limiting models to a shallow understanding of video content. To move beyond this paradigm, we introduce a framework to synthesize richer supervisory signals. We propose two complementary strategies: Question-Based Paraphrasing (QBP), which synthesizes the diverse inquiries (what, how, why) from a video's existing set of question-answer pairs into a holistic narrative paragraph that reconstructs the video's event structure; and Question-Based Captioning (QBC), which generates fine-grained visual rationales, grounding the answer to each question in specific, relevant evidence. Leveraging powerful generative models, we use this synthetic data to train VideoQA models under a unified next-token prediction objective. Extensive experiments on STAR and NExT-QA validate our approach, demonstrating significant accuracy gains and establishing new state-of-the-art results, such as improving a 3B model to 72.5\% on STAR (+4.9\%) and a 7B model to 80.8\% on NExT-QA. Beyond accuracy, our analysis reveals that both QBP and QBC substantially enhance cross-dataset generalization, with QBP additionally accelerating model convergence by over 2.5x. These results demonstrate that shifting data synthesis from isolated facts to narrative coherence and grounded rationales yields a more accurate, efficient, and generalizable training paradigm.

large language model, machine learning, question answering, (21 more...)

arXiv.org Artificial Intelligence

2509.24445

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.93)

Add feedback

AnveshanaAI: A Multimodal Platform for Adaptive AI/ML Education through Automated Question Generation and Interactive Assessment

Thakur, Rakesh, Khandelwal, Diksha, Tiwari, Shreya

arXiv.org Artificial IntelligenceSep-30-2025

We propose AnveshanaAI, an application-based learning platform for artificial intelligence. With AnveshanaAI, learners are presented with a personalized dashboard featuring streaks, levels, badges, and structured navigation across domains such as data science, machine learning, deep learning, transformers, generative AI, large language models, and multimodal AI, with scope to include more in the future. The platform incorporates gamified tracking with points and achievements to enhance engagement and learning, while switching between Playground, Challenges, Simulator, Dashboard, and Community supports exploration and collaboration. Unlike static question repositories used in existing platforms, AnveshanaAI ensures balanced learning progression through a dataset grounded in Bloom's taxonomy, with semantic similarity checks and explainable AI techniques improving transparency and reliability. Adaptive, automated, and domain-aware assessment methods are also employed. Experiments demonstrate broad dataset coverage, stable fine-tuning with reduced perplexity, and measurable gains in learner engagement. Together, these features illustrate how AnveshanaAI integrates adaptivity, gamification, interactivity, and explainability to support next-generation AI education.

large language model, machine learning, question answering, (21 more...)

arXiv.org Artificial Intelligence

2509.23811

Genre: Research Report > New Finding (0.68)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

Memory-QA: Answering Recall Questions Based on Multimodal Memories

Jiang, Hongda, Zhang, Xinyuan, Garg, Siddhant, Arora, Rishab, Kuo, Shiun-Zu, Xu, Jiayang, Bansal, Ankur, Brossman, Christopher, Liu, Yue, Colak, Aaron, Aly, Ahmed, Kumar, Anuj, Dong, Xin Luna

arXiv.org Artificial IntelligenceSep-30-2025

We introduce Memory-QA, a novel real-world task that involves answering recall questions about visual content from previously stored multimodal memories. This task poses unique challenges, including the creation of task-oriented memories, the effective utilization of temporal and location information within memories, and the ability to draw upon multiple memories to answer a recall question. To address these challenges, we propose a comprehensive pipeline, Pensieve, integrating memory-specific augmentation, time- and location-aware multi-signal retrieval, and multi-memory QA fine-tuning. We created a multimodal benchmark to illustrate various real challenges in this task, and show the superior performance of Pensieve over state-of-the-art solutions (up to 14% on QA accuracy).

large language model, machine learning, recall question, (21 more...)

arXiv.org Artificial Intelligence

2509.18436

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

A Novel Differential Feature Learning for Effective Hallucination Detection and Classification

Wang, Wenkai, Lee, Vincent, Zheng, Yizhen

arXiv.org Artificial IntelligenceSep-29-2025

Large language model hallucination represents a critical challenge where outputs deviate from factual accuracy due to distributional biases in training data. While recent investigations establish that specific hidden layers exhibit differences between hallucinatory and factual content, the precise localization of hallucination signals within layers remains unclear, limiting the development of efficient detection methods. We propose a dual-model architecture integrating a Projected Fusion (PF) block for adaptive inter-layer feature weighting and a Differential Feature Learning (DFL) mechanism that identifies discriminative features by computing differences between parallel encoders learning complementary representations from identical inputs. Through systematic experiments across HaluEval's question answering, dialogue, and summarization datasets, we demonstrate that hallucination signals concentrate in highly sparse feature subsets, achieving significant accuracy improvements on question answering and dialogue tasks. Notably, our analysis reveals a hierarchical "funnel pattern" where shallow layers exhibit high feature diversity while deep layers demonstrate concentrated usage, enabling detection performance to be maintained with minimal degradation using only 1\% of feature dimensions. These findings suggest that hallucination signals are more concentrated than previously assumed, offering a pathway toward computationally efficient detection systems that could reduce inference costs while maintaining accuracy.

machine learning, natural language, question answering, (16 more...)

arXiv.org Artificial Intelligence

2509.21357

Country: Oceania > Australia (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.75)

Add feedback

From Search to Reasoning: A Five-Level RAG Capability Framework for Enterprise Data

Gill, Gurbinder, Gupta, Ritvik, Lusson, Denis, Chandrashekar, Anand, Nguyen, Donald

arXiv.org Artificial IntelligenceSep-29-2025

Retrieval-Augmented Generation (RAG) has emerged as the standard paradigm for answering questions on enterprise data. Traditionally, RAG has centered on text-based semantic search and re-ranking. However, this approach falls short when dealing with questions beyond data summarization or non-text data. This has led to various attempts to supplement RAG to bridge the gap between RAG, the implementation paradigm, and the question answering problem that enterprise users expect it to solve. Given that contemporary RAG is a collection of techniques rather than a defined implementation, discussion of RAG and related question-answering systems benefits from a problem-oriented understanding. We propose a new classification framework (L1-L5) to categorize systems based on data modalities and task complexity of the underlying question answering problems: L1 (Surface Knowledge of Unstructured Data) through L4 (Reflective and Reasoned Knowledge) and the aspirational L5 (General Intelligence). We also introduce benchmarks aligned with these levels and evaluate four state-of-the-art platforms: LangChain, Azure AI Search, OpenAI, and Corvic AI. Our experiments highlight the value of multi-space retrieval and dynamic orchestration for enabling L1-L4 capabilities. We empirically validate our findings using diverse datasets indicative of enterprise use cases.

large language model, machine learning, question answering, (21 more...)

arXiv.org Artificial Intelligence

2509.21324

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback