AITopics | Question Answering

Collaborating Authors

Question Answering

"Questions are asked and answered every day. Question answering (QA) technology aims to deliver the same facility online. It goes further than the more familiar search based on keywords (as in Google, Yahoo, and other search engines), in attempting to recognize what a question expresses and to respond with an actual answer. This simplifies things for users in two ways. First, questions do not often translate into a simple list of keywords. ...Second, QA takes responsibility for providing answers, rather than a searchable list of links to potentially relevant documents (web pages), highlighted by snippets of text that show how the query matched the documents."
– from Bonnie Webber & Nick Webb. Question Answering. In The Handbook of Computational Linguistics and Natural Language Processing. Alexander Clark, Chris Fox, Shalom Lappin (Eds.). Wiley, 2010.

News Overviews Instructional Materials AI-Alerts Classics

ChemRxivQuest: A Curated Chemistry Question-Answer Database Extracted from ChemRxiv Preprints

Amiri, Mahmoud, Bocklitz, Thomas

arXiv.org Artificial IntelligenceJun-16-2025

The rapid expansion of chemistry literature poses significant challenges for researchers seeking to efficiently access domain-specific knowledge. To support advancements in chemistry-focused natural language processing (NLP), we present ChemRxivQuest, a curated dataset of 970 high-quality question-answer (QA) pairs derived from 155 ChemRxiv preprints across 17 subfields of chemistry. Each QA pair is explicitly linked to its source text segment to ensure traceability and contextual accuracy. ChemRxivQuest was constructed using an automated pipeline that combines optical character recognition (OCR), GPT-4o-based QA generation, and a fuzzy matching technique for answer verification. The dataset emphasizes conceptual, mechanistic, applied, and experimental questions, enabling applications in retrieval-based QA systems, search engine development, and fine-tuning of domain-adapted large language models. We analyze the dataset's structure, coverage, and limitations, and outline future directions for expansion and expert validation. ChemRxivQuest provides a foundational resource for chemistry NLP research, education, and tool development.

large language model, machine learning, question answering, (20 more...)

arXiv.org Artificial Intelligence

2505.05232

Country: Europe (0.14)

Genre: Research Report (0.82)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.93)

Add feedback

TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision

Gupta, Ayush, Roy, Anirban, Chellappa, Rama, Bastian, Nathaniel D., Velasquez, Alvaro, Jha, Susmit

arXiv.org Artificial IntelligenceJun-12-2025

We address the problem of video question answering (video QA) with temporal grounding in a weakly supervised setup, without any temporal annotations. Given a video and a question, we generate an open-ended answer grounded with the start and end time. For this task, we propose TOGA: a vision-language model for Temporally Grounded Open-Ended Video QA with Weak Supervision. We instruct-tune TOGA to jointly generate the answer and the temporal grounding. We operate in a weakly supervised setup where the temporal grounding annotations are not available. We generate pseudo labels for temporal grounding and ensure the validity of these labels by imposing a consistency constraint between the question of a grounding response and the response generated by a question referring to the same temporal segment. We notice that jointly generating the answers with the grounding improves performance on question answering as well as grounding. We evaluate TOGA on grounded QA and open-ended QA tasks. For grounded QA, we consider the NExT-GQA benchmark which is designed to evaluate weakly supervised grounded question answering. For open-ended QA, we consider the MSVD-QA and ActivityNet-QA benchmarks. We achieve state-of-the-art performance for both tasks on these benchmarks.

large language model, machine learning, question answering, (18 more...)

arXiv.org Artificial Intelligence

2506.09445

Country: North America > United States (1.00)

Genre: Research Report (0.50)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

CheckManual: A New Challenge and Benchmark for Manual-based Appliance Manipulation

Long, Yuxing, Zhang, Jiyao, Pan, Mingjie, Wu, Tianshu, Kim, Taewhan, Dong, Hao

arXiv.org Artificial IntelligenceJun-12-2025

Correct use of electrical appliances has significantly improved human life quality. Unlike simple tools that can be manipulated with common sense, different parts of electrical appliances have specific functions defined by manufacturers. If we want the robot to heat bread by microwave, we should enable them to review the microwave manual first. From the manual, it can learn about component functions, interaction methods, and representative task steps about appliances. However, previous manual-related works remain limited to question-answering tasks while existing manipulation researchers ignore the manual's important role and fail to comprehend multi-page manuals. In this paper, we propose the first manual-based appliance manipulation benchmark CheckManual. Specifically, we design a large model-assisted human-revised data generation pipeline to create manuals based on CAD appliance models. With these manuals, we establish novel manual-based manipulation challenges, metrics, and simulator environments for model performance evaluation. Furthermore, we propose the first manual-based manipulation planning model ManualPlan to set up a group of baselines for the CheckManual benchmark.

large language model, machine learning, question answering, (20 more...)

arXiv.org Artificial Intelligence

2506.09343

Country: Asia (0.28)

Genre: Research Report (0.50)

Industry: Electrical Industrial Apparatus (0.94)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering

Gupta, Akash, Storkey, Amos, Lapata, Mirella

arXiv.org Artificial IntelligenceJun-12-2025

Large Multimodal Models (LMMs) often rely on in-context learning (ICL) to perform new tasks with minimal supervision. However, ICL performance, especially in smaller LMMs, is inconsistent and does not always improve monotonically with increasing examples. We hypothesize that this occurs due to the LMM being overwhelmed by additional information present in the image embeddings, which is not required for the downstream task. To address this, we propose a meta-learning approach that provides an alternative for inducing few-shot capabilities in LMMs, using a fixed set of soft prompts that are distilled from task-relevant image features and can be adapted at test time using a few examples. To facilitate this distillation, we introduce an attention-mapper module that can be easily integrated with the popular LLaVA v1.5 architecture and is jointly learned with soft prompts, enabling task adaptation in LMMs under low-data regimes with just a few gradient steps. Evaluation on the VL-ICL Bench shows that our method consistently outperforms ICL and related prompt-tuning approaches, even under image perturbations, improving task induction and reasoning across visual question answering tasks.

large language model, machine learning, question answering, (20 more...)

arXiv.org Artificial Intelligence

2506.06905

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Advancing Question Generation with Joint Narrative and Difficulty Control

Leite, Bernardo, Cardoso, Henrique Lopes

arXiv.org Artificial IntelligenceJun-10-2025

Question Generation (QG), the task of automatically generating questions from a source input, has seen significant progress in recent years. Difficulty-controllable QG (DCQG) enables control over the difficulty level of generated questions while considering the learner's ability. Additionally, narrative-controllable QG (NCQG) allows control over the narrative aspects embedded in the questions. However, research in QG lacks a focus on combining these two types of control, which is important for generating questions tailored to educational purposes. To address this gap, we propose a strategy for Joint Narrative and Difficulty Control, enabling simultaneous control over these two attributes in the generation of reading comprehension questions. Our evaluation provides preliminary evidence that this approach is feasible, though it is not effective across all instances. Our findings highlight the conditions under which the strategy performs well and discuss the trade-offs associated with its application.

difficulty level, natural language, question answering, (16 more...)

arXiv.org Artificial Intelligence

2506.06812

Country:

Europe (0.93)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education > Assessment & Standards > Student Performance (0.55)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Add feedback

UTSA-NLP at ArchEHR-QA 2025: Improving EHR Question Answering via Self-Consistency Prompting

Shields-Menard, Sara, Reimers, Zach, Gardner, Joshua, Perry, David, Rios, Anthony

arXiv.org Artificial IntelligenceJun-9-2025

We describe our system for the ArchEHR-QA Shared Task on answering clinical questions using electronic health records (EHRs). Our approach uses large language models in two steps: first, to find sentences in the EHR relevant to a clinician's question, and second, to generate a short, citation-supported response based on those sentences. We use few-shot prompting, self-consistency, and thresholding to improve the sentence classification step to decide which sentences are essential. We compare several models and find that a smaller 8B model performs better than a larger 70B model for identifying relevant information. Our results show that accurate sentence selection is critical for generating high-quality responses and that self-consistency with thresholding helps make these decisions more reliable.

classification, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2506.05589

Country: Europe > Austria (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Health Care Technology > Medical Record (0.71)
Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.65)

Add feedback

ReXVQA: A Large-scale Visual Question Answering Benchmark for Generalist Chest X-ray Understanding

Pal, Ankit, Lee, Jung-Oh, Zhang, Xiaoman, Sankarasubbu, Malaikannan, Roh, Seunghyeon, Kim, Won Jung, Lee, Meesun, Rajpurkar, Pranav

arXiv.org Artificial IntelligenceJun-6-2025

We present ReXVQA, the largest and most comprehensive benchmark for visual question answering (VQA) in chest radiology, comprising approximately 696,000 questions paired with 160,000 chest X-rays studies across training, validation, and test sets. Unlike prior efforts that rely heavily on template based queries, ReXVQA introduces a diverse and clinically authentic task suite reflecting five core radiological reasoning skills: presence assessment, location analysis, negation detection, differential diagnosis, and geometric reasoning. We evaluate eight state-of-the-art multimodal large language models, including MedGemma-4B-it, Qwen2.5-VL, Janus-Pro-7B, and Eagle2-9B. The best-performing model (MedGemma) achieves 83.24% overall accuracy. To bridge the gap between AI performance and clinical expertise, we conducted a comprehensive human reader study involving 3 radiology residents on 200 randomly sampled cases. Our evaluation demonstrates that MedGemma achieved superior performance (83.84% accuracy) compared to human readers (best radiology resident: 77.27%), representing a significant milestone where AI performance exceeds expert human evaluation on chest X-ray interpretation. The reader study reveals distinct performance patterns between AI models and human experts, with strong inter-reader agreement among radiologists while showing more variable agreement patterns between human readers and AI models. ReXVQA establishes a new standard for evaluating generalist radiological AI systems, offering public leaderboards, fine-grained evaluation splits, structured explanations, and category-level breakdowns. This benchmark lays the foundation for next-generation AI systems capable of mimicking expert-level clinical reasoning beyond narrow pathology classification. Our dataset will be open-sourced at https://huggingface.co/datasets/rajpurkarlab/ReXVQA

large language model, machine learning, question answering, (23 more...)

arXiv.org Artificial Intelligence

2506.04353

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
(2 more...)

Add feedback

Hierarchical Question-Answering for Driving Scene Understanding Using Vision-Language Models

Mohamud, Safaa Abdullahi Moallim, Baek, Minjin, Han, Dong Seog

arXiv.org Artificial IntelligenceJun-4-2025

B. VLM Fine-Tuning and Hierarchical QA Inference For the fine-tuning and inference stages, a compact VLM is employed. All trainable parameters are fine-tuned on the collected dataset to ensure efficient adaptation. The fine-tuned weights are chosen based on the highest VQA accuracy achieved on the validation set. A hierarchical questioning technique is used to mitigate the limitations of smaller VLMs in generating long paragraphs. Instead of generating a single long paragraph, the system asks structured questions in a hierarchical manner, enabling efficient processing and resource minimization. This allows the VLM to generate meaningful scene descriptions without compromising speed or scalability. The hierarchical QA strategy optimizes inference time by dynamically selecting relevant questions based on the visual elements in the scene. For example, if the question "Is the ego vehicle moving on a straight road?" is answered affirmatively, subsequent questions such as "In which direction does the road curve?" are automatically answered as "none," reducing unnecessary computations. Similarly, if the answer to "Is it possible for the ego vehicle to turn right at the intersection?" is "no," other related questions, such as "Is there a vehicle approaching the intersection from the opposite direction?"

large language model, machine learning, question answering, (18 more...)

arXiv.org Artificial Intelligence

2506.02615

Country: Asia > South Korea (0.14)

Genre: Research Report (0.82)

Industry: Transportation > Ground > Road (0.33)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.52)
(2 more...)

Add feedback

Multi-Hop Question Generation via Dual-Perspective Keyword Guidance

Li, Maodong, Zhang, Longyin, Kong, Fang

arXiv.org Artificial IntelligenceJun-4-2025

Multi-hop question generation (MQG) aims to generate questions that require synthesizing multiple information snippets from documents to derive target answers. The primary challenge lies in effectively pinpointing crucial information snippets related to question-answer (QA) pairs, typically relying on keywords. However, existing works fail to fully utilize the guiding potential of keywords and neglect to differentiate the distinct roles of question-specific and document-specific keywords. To address this, we define dual-perspective keywords (i.e., question and document keywords) and propose a Dual-Perspective Keyword-Guided (DPKG) framework, which seamlessly integrates keywords into the multi-hop question generation process. We argue that question keywords capture the questioner's intent, whereas document keywords reflect the content related to the QA pair. Functionally, question and document keywords work together to pinpoint essential information snippets in the document, with question keywords required to appear in the generated question. The DPKG framework consists of an expanded transformer encoder and two answer-aware transformer decoders for keyword and question generation, respectively. Extensive experiments demonstrate the effectiveness of our work, showcasing its promising performance and underscoring its significant value in the MQG task.

machine learning, natural language, question answering, (20 more...)

arXiv.org Artificial Intelligence

2505.15299

Country:

Europe (0.93)
North America > United States > California (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Media (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Respond Beyond Language: A Benchmark for Video Generation in Response to Realistic User Intents

Wang, Shuting, Liu, Yunqi, Yang, Zixin, Hu, Ning, Dou, Zhicheng, Xiong, Chenyan

arXiv.org Artificial IntelligenceJun-3-2025

Querying generative AI models, e.g., large language models (LLMs), has become a prevalent method for information acquisition. However, existing query-answer datasets primarily focus on textual responses, making it challenging to address complex user queries that require visual demonstrations or explanations for better understanding. To bridge this gap, we construct a benchmark, RealVideoQuest, designed to evaluate the abilities of text-to-video (T2V) models in answering real-world, visually grounded queries. It identifies 7.5K real user queries with video response intents from Chatbot-Arena and builds 4.5K high-quality query-video pairs through a multistage video retrieval and refinement process. We further develop a multi-angle evaluation system to assess the quality of generated video answers. Experiments indicate that current T2V models struggle with effectively addressing real user queries, pointing to key challenges and future research opportunities in multimodal AI.

large language model, machine learning, question answering, (23 more...)

arXiv.org Artificial Intelligence

2506.01689

Country:

Europe (0.28)
Asia (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback