AITopics | Question Answering

Collaborating Authors

Question Answering

"Questions are asked and answered every day. Question answering (QA) technology aims to deliver the same facility online. It goes further than the more familiar search based on keywords (as in Google, Yahoo, and other search engines), in attempting to recognize what a question expresses and to respond with an actual answer. This simplifies things for users in two ways. First, questions do not often translate into a simple list of keywords. ...Second, QA takes responsibility for providing answers, rather than a searchable list of links to potentially relevant documents (web pages), highlighted by snippets of text that show how the query matched the documents."
– from Bonnie Webber & Nick Webb. Question Answering. In The Handbook of Computational Linguistics and Natural Language Processing. Alexander Clark, Chris Fox, Shalom Lappin (Eds.). Wiley, 2010.

News Overviews Instructional Materials AI-Alerts Classics

Hierarchical Modeling for Medical Visual Question Answering with Cross-Attention Fusion

Zhang, Junkai, Li, Bin, Zhou, Shoujun, Du, Yue

arXiv.org Artificial IntelligenceApr-11-2025

Medical Visual Question Answering (Med-VQA) answers clinical questions using medical images, aiding diagnosis. Designing the MedVQA system holds profound importance in assisting clinical diagnosis and enhancing diagnostic accuracy. Building upon this foundation, Hierarchical Medical VQA extends Medical VQA by organizing medical questions into a hierarchical structure and making level-specific predictions to handle fine-grained distinctions. Recently, many studies have proposed hierarchical MedVQA tasks and established datasets, However, several issues still remain: (1) imperfect hierarchical modeling leads to poor differentiation between question levels causing semantic fragmentation across hierarchies. (2) Excessive reliance on implicit learning in Transformer-based cross-modal self-attention fusion methods, which obscures crucial local semantic correlations in medical scenarios. To address these issues, this study proposes a HiCA-VQA method, including two modules: Hierarchical Prompting for fine-grained medical questions and Hierarchical Answer Decoders. The hierarchical prompting module pre-aligns hierarchical text prompts with image features to guide the model in focusing on specific image regions according to question types, while the hierarchical decoder performs separate predictions for questions at different levels to improve accuracy across granularities. The framework also incorporates a cross-attention fusion module where images serve as queries and text as key-value pairs. Experiments on the Rad-Restruct benchmark demonstrate that the HiCA-VQA framework better outperforms existing state-of-the-art methods in answering hierarchical fine-grained questions. This study provides an effective pathway for hierarchical visual question answering systems, advancing medical image understanding.

large language model, machine learning, question answering, (21 more...)

arXiv.org Artificial Intelligence

2504.03135

Country: Asia > China > Guangdong Province (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Visual Environment-Interactive Planning for Embodied Complex-Question Answering

Lan, Ning, Ou, Baoshan, Xie, Xuemei, Shi, Guangming

arXiv.org Artificial IntelligenceApr-1-2025

--This study focuses on Embodied Complex-Question Answering task, which means the embodied robot need to understand human questions with intricate structures and abstract semantics. The core of this task lies in making appropriate plans based on the perception of the visual environment. Existing methods often generate plans in a once-for-all manner, i.e., one-step planning . Such approach rely on large models, without sufficient understanding of the environment. Considering multi-step planning, the framework for formulating plans in a sequential manner is proposed in this paper . T o ensure the ability of our framework to tackle complex questions, we create a structured semantic space, where hierarchical visual perception and chain expression of the question essence can achieve iterative interaction. This space makes sequential task planning possible. Within the framework, we first parse human natural language based on a visual hierarchical scene graph, which can clarify the intention of the question. Then, we incorporate external rules to make a plan for current step, weakening the reliance on large models. Every plan is generated based on feedback from visual perception, with multiple rounds of interaction until an answer is obtained. This approach enables continuous feedback and adjustment, allowing the robot to optimize its action strategy. T o test our framework, we contribute a new dataset with more complex questions. Experimental results demonstrate that our approach performs excellently and stably on complex tasks. And also, the feasibility of our approach in real-world scenarios has been established, indicating its practical applicability. Index T erms --Embodied complex-question answering, task planning, language parsing, structured semantic space. HE development of versatile embodied agents capable of understanding natural language commands in indoor environments and executing various tasks through visual interaction has been a long-standing goal.

large language model, machine learning, question answering, (22 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TCSVT.2025.3538860

2504.00775

Country:

Asia > China > Shaanxi Province > Xi'an (0.05)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.91)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.82)
(3 more...)

Add feedback

SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering

Li, Bingxin

arXiv.org Artificial IntelligenceApr-1-2025

Multimodal models integrating speech and vision hold significant potential for advancing human-computer interaction, particularly in Speech-Based Visual Question Answering (SBVQA) where spoken questions about images require direct audio-visual understanding. Existing approaches predominantly focus on text-visual integration, leaving speech-visual modality gaps underexplored due to their inherent heterogeneity. To this end, we introduce SViQA, a unified speech-vision model that directly processes spoken questions without text transcription. Building upon the LLaVA architecture, our framework bridges auditory and visual modalities through two key innovations: (1) end-to-end speech feature extraction eliminating intermediate text conversion, and (2) cross-modal alignment optimization enabling effective fusion of speech signals with visual content. Extensive experimental results on the SBVQA benchmark demonstrate the proposed SViQA's state-of-the-art performance, achieving 75.62% accuracy, and competitive multimodal generalization. Leveraging speech-text mixed input boosts performance to 78.85%, a 3.23% improvement over pure speech input, highlighting SViQA's enhanced robustness and effective cross-modal attention alignment.

machine learning, natural language, question answering, (19 more...)

arXiv.org Artificial Intelligence

2504.01049

Country:

Asia > Middle East > Kuwait > Ahmadi Governorate > Al Ahmadi (0.05)
North America > United States (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

Shapley Revisited: Tractable Responsibility Measures for Query Answers

Bienvenu, Meghyn, Figueira, Diego, Lafourcade, Pierre

arXiv.org Artificial IntelligenceMar-28-2025

The Shapley value, originating from cooperative game theory, has been employed to define responsibility measures that quantify the contributions of database facts to obtaining a given query answer. For non-numeric queries, this is done by considering a cooperative game whose players are the facts and whose wealth function assigns 1 or 0 to each subset of the database, depending on whether the query answer holds in the given subset. While conceptually simple, this approach suffers from a notable drawback: the problem of computing such Shapley values is #P-hard in data complexity, even for simple conjunctive queries. This motivates us to revisit the question of what constitutes a reasonable responsibility measure and to introduce a new family of responsibility measures -- weighted sums of minimal supports (WSMS) -- which satisfy intuitive properties. Interestingly, while the definition of WSMSs is simple and bears no obvious resemblance to the Shapley value formula, we prove that every WSMS measure can be equivalently seen as the Shapley value of a suitably defined cooperative game. Moreover, WSMS measures enjoy tractable data complexity for a large class of queries, including all unions of conjunctive queries. We further explore the combined complexity of WSMS computation and establish (in)tractability results for various subclasses of conjunctive queries.

minimal support, natural language, question answering, (19 more...)

arXiv.org Artificial Intelligence

2503.22358

Country:

Europe > France (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.90)

Add feedback

AskSport: Web Application for Sports Question-Answering

Onofre, Enzo B, Moraes, Leonardo M P, Aguiar, Cristina D

arXiv.org Artificial IntelligenceMar-26-2025

This paper introduces AskSport, a question-answering web application about sports. It allows users to ask questions using natural language and retrieve the three most relevant answers, including related information and documents. The paper describes the characteristics and functionalities of the application, including use cases demonstrating its ability to return names and numerical values. AskSport and its implementation are available for public access on HuggingFace.

artificial intelligence, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

2503.21067

Country: South America > Brazil > São Paulo (0.05)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.92)

Add feedback

When is dataset cartography ineffective? Using training dynamics does not improve robustness against Adversarial SQuAD

Mandal, Paul K.

arXiv.org Artificial IntelligenceMar-23-2025

In this paper, I investigate the effectiveness of dataset cartography for extractive question answering on the SQuAD dataset. I begin by analyzing annotation artifacts in SQuAD and evaluate the impact of two adversarial datasets, AddSent and AddOneSent, on an ELECTRA-small model. Using training dynamics, I partition SQuAD into easy-to-learn, ambiguous, and hard-to-learn subsets. I then compare the performance of models trained on these subsets to those trained on randomly selected samples of equal size. Results show that training on cartography-based subsets does not improve generalization to the SQuAD validation set or the AddSent adversarial set. While the hard-to-learn subset yields a slightly higher F1 score on the AddOneSent dataset, the overall gains are limited. These findings suggest that dataset cartography provides little benefit for adversarial robustness in SQuAD-style QA tasks. I conclude by comparing these results to prior findings on SNLI and discuss possible reasons for the observed differences.

dataset cartography, machine learning, question answering, (17 more...)

arXiv.org Artificial Intelligence

2503.1829

Country:

North America > United States > Texas > Travis County > Austin (0.28)
North America > United States > Colorado (0.05)
Europe > Italy > Tuscany > Florence (0.05)
(9 more...)

Genre: Research Report > New Finding (0.87)

Industry: Leisure & Entertainment > Sports > Football (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.37)

Add feedback

Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering

Lee, DongGeon, Park, Ahjeong, Lee, Hyeri, Nam, Hyeonseo, Maeng, Yunho

arXiv.org Artificial IntelligenceMar-21-2025

Non-factoid question-answering (NFQA) poses a significant challenge due to its open-ended nature, diverse intents, and the need for multi-aspect reasoning, which renders conventional factoid QA approaches, including retrieval-augmented generation (RAG), inadequate. Unlike factoid questions, non-factoid questions (NFQs) lack definitive answers and require synthesizing information from multiple sources across various reasoning dimensions. To address these limitations, we introduce Typed-RAG, a type-aware multi-aspect decomposition framework within the RAG paradigm for NFQA. Typed-RAG classifies NFQs into distinct types -- such as debate, experience, and comparison -- and applies aspect-based decomposition to refine retrieval and generation strategies. By decomposing multi-aspect NFQs into single-aspect sub-queries and aggregating the results, Typed-RAG generates more informative and contextually relevant responses. To evaluate Typed-RAG, we introduce Wiki-NFQA, a benchmark dataset covering diverse NFQ types. Experimental results demonstrate that Typed-RAG outperforms baselines, thereby highlighting the importance of type-aware decomposition for effective retrieval and generation in NFQA. Our code and dataset are available at https://github.com/TeamNLP/Typed-RAG.

large language model, machine learning, question answering, (19 more...)

arXiv.org Artificial Intelligence

2503.15879

Country:

North America > United States (0.28)
Europe > Portugal (0.04)
Europe > Poland (0.04)
(6 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine (1.00)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (1.00)
Government (0.68)
Banking & Finance > Economy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.85)

Add feedback

ECLAIR: Enhanced Clarification for Interactive Responses

Murzaku, John, Liu, Zifan, Tanjim, Md Mehrab, Muppala, Vaishnavi, Chen, Xiang, Li, Yunyao

arXiv.org Artificial IntelligenceMar-19-2025

We present ECLAIR (Enhanced CLArification for Interactive Responses), a novel unified and end-to-end framework for interactive disambiguation in enterprise AI assistants. ECLAIR generates clarification questions for ambiguous user queries and resolves ambiguity based on the user's response.We introduce a generalized architecture capable of integrating ambiguity information from multiple downstream agents, enhancing context-awareness in resolving ambiguities and allowing enterprise specific definition of agents. We further define agents within our system that provide domain-specific grounding information. We conduct experiments comparing ECLAIR to few-shot prompting techniques and demonstrate ECLAIR's superior performance in clarification question generation and ambiguity resolution.

ambiguity, large language model, question answering, (19 more...)

arXiv.org Artificial Intelligence

2503.15739

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Singapore (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.38)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.35)

Add feedback

Evaluating Bias in Retrieval-Augmented Medical Question-Answering Systems

Ji, Yuelyu, Zhang, Hang, Wang, Yanshan

arXiv.org Artificial IntelligenceMar-19-2025

Medical QA systems powered by Retrieval-Augmented Generation (RAG) models support clinical decision-making but may introduce biases related to race, gender, and social determinants of health. We systematically evaluate biases in RAG-based LLM by examining demographic-sensitive queries and measuring retrieval discrepancies. Using datasets like MMLU and MedMCQA, we analyze retrieval overlap and correctness disparities. Our findings reveal substantial demographic disparities within RAG pipelines, emphasizing the critical need for retrieval methods that explicitly account for fairness to ensure equitable clinical decision-making.

large language model, question answering, retrieval-augmented medical question-answering system, (2 more...)

arXiv.org Artificial Intelligence

2503.15454

Genre: Research Report (0.69)

Industry: Health & Medicine (0.89)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.85)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback

Attribution Score Alignment in Explainable Data Management

Azua, Felipe, Bertossi, Leopoldo

arXiv.org Artificial IntelligenceMar-18-2025

Different attribution-scores have been proposed to quantify the relevance of database tuples for a query answer from a database. Among them, we find Causal Responsibility, the Shapley Value, the Banzhaf Power-Index, and the Causal Effect. They have been analyzed in isolation, mainly in terms of computational properties. In this work, we start an investigation into the alignment of these scores on the basis of the queries at hand; that is, on whether they induce compatible rankings of tuples. We are able to identify vast classes of queries for which some pairs of scores are always aligned, and others for which they are not. It turns out that the presence of exogenous tuples makes a crucial difference in this regard.

natural language, question answering, tuple, (14 more...)

arXiv.org Artificial Intelligence

2503.14469

Country:

North America > United States (0.14)
North America > Dominican Republic > Azua > Azua (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.35)

Add feedback