AITopics | Question Answering

Collaborating Authors

Question Answering

"Questions are asked and answered every day. Question answering (QA) technology aims to deliver the same facility online. It goes further than the more familiar search based on keywords (as in Google, Yahoo, and other search engines), in attempting to recognize what a question expresses and to respond with an actual answer. This simplifies things for users in two ways. First, questions do not often translate into a simple list of keywords. ...Second, QA takes responsibility for providing answers, rather than a searchable list of links to potentially relevant documents (web pages), highlighted by snippets of text that show how the query matched the documents."
– from Bonnie Webber & Nick Webb. Question Answering. In The Handbook of Computational Linguistics and Natural Language Processing. Alexander Clark, Chris Fox, Shalom Lappin (Eds.). Wiley, 2010.

News Overviews Instructional Materials AI-Alerts Classics

Appendix for " Introspective Distillation for Robust Question Answering " A Causal QA Model Figure A1: Causal graph for QA

Neural Information Processing SystemsAug-15-2025, 16:39:30 GMT

Figure A1 shows the causal graph for QA. We use indirect effects as the predictions of OOD teachers. All the used datasets are open-sourced for research use. We train the teacher model following the source codes. For the student model, we use the same VQA main branch, the baseline model UpDn, as implementation.

machine learning, question answering, teacher model, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.44)

Add feedback

6246e04dcf42baf7c71e3a65d3d93b55-Paper-Conference.pdf

Neural Information Processing SystemsAug-15-2025, 07:03:41 GMT

graph, inference graph, query, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.14)
Europe > Switzerland > Zürich > Zürich (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Medico 2025: Visual Question Answering for Gastrointestinal Imaging

Gautam, Sushant, Thambawita, Vajira, Riegler, Michael, Halvorsen, Pål, Hicks, Steven

arXiv.org Artificial IntelligenceAug-15-2025

The Medico 2025 challenge addresses Visual Question Answering (VQA) for Gastrointestinal (GI) imaging, organized as part of the MediaEval task series. The challenge focuses on developing Explainable Artificial Intelligence (XAI) models that answer clinically relevant questions based on GI endoscopy images while providing interpretable justifications aligned with medical reasoning. It introduces two subtasks: (1) answering diverse types of visual questions using the Kvasir-VQA-x1 dataset, and (2) generating multimodal explanations to support clinical decision-making. The Kvasir-VQA-x1 dataset, created from 6,500 images and 159,549 complex question-answer (QA) pairs, serves as the benchmark for the challenge. By combining quantitative performance metrics and expert-reviewed explainability assessments, this task aims to advance trustworthy Artificial Intelligence (AI) in medical image analysis. Instructions, data access, and an updated guide for participation are available in the official competition repository: https://github.com/simula/MediaEval-Medico-2025

medico 2025, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

2508.10869

Country:

Europe > Norway (0.17)
Europe > Czechia (0.14)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.93)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.89)

Add feedback

EgoCross: Benchmarking Multimodal Large Language Models for Cross-Domain Egocentric Video Question Answering

Li, Yanjun, Fu, Yuqian, Qian, Tianwen, Xu, Qi'ao, Dai, Silong, Paudel, Danda Pani, Van Gool, Luc, Wang, Xiaoling

arXiv.org Artificial IntelligenceAug-15-2025

Recent advances in Multimodal Large Language Models (MLLMs) have significantly pushed the frontier of egocentric video question answering (EgocentricQA). However, existing benchmarks and studies are mainly limited to common daily activities such as cooking and cleaning. In contrast, real-world deployment inevitably encounters domain shifts, where target domains differ substantially in both visual style and semantic content. To bridge this gap, we introduce \textbf{EgoCross}, a comprehensive benchmark designed to evaluate the cross-domain generalization of MLLMs in EgocentricQA. EgoCross covers four diverse and challenging domains, including surgery, industry, extreme sports, and animal perspective, representing realistic and high-impact application scenarios. It comprises approximately 1,000 QA pairs across 798 video clips, spanning four key QA tasks: prediction, recognition, localization, and counting. Each QA pair provides both OpenQA and CloseQA formats to support fine-grained evaluation. Extensive experiments show that most existing MLLMs, whether general-purpose or egocentric-specialized, struggle to generalize to domains beyond daily life, highlighting the limitations of current models. Furthermore, we conduct several pilot studies, \eg, fine-tuning and reinforcement learning, to explore potential improvements. We hope EgoCross and our accompanying analysis will serve as a foundation for advancing domain-adaptive, robust egocentric video understanding. Data and codes will be released at: \href{https://github.com/MyUniverse0726/EgoCross}{https://github.com/MyUniverse0726/EgoCross.}

large language model, machine learning, question answering, (19 more...)

arXiv.org Artificial Intelligence

2508.10729

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Surgery (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Learning from Natural Language Feedback for Personalized Question Answering

Salemi, Alireza, Zamani, Hamed

arXiv.org Artificial IntelligenceAug-15-2025

Personalization is crucial for enhancing both the effectiveness and user satisfaction of language technologies, particularly in information-seeking tasks like question answering. Current approaches for personalizing large language models (LLMs) often rely on retrieval-augmented generation (RAG), followed by reinforcement learning with scalar reward signals to teach models how to use retrieved personal context. We believe that these scalar rewards sometimes provide weak, non-instructive feedback, limiting learning efficiency and personalization quality. We introduce VAC, a novel framework for personalized response generation that replaces scalar rewards with natural language feedback (NLF) that are generated conditioned on the user profiles and the question narratives. NLF serves as a rich and actionable supervision signal, allowing the policy model to iteratively refine its outputs and internalize effective personalization strategies. Training alternates between optimizing the feedback model and fine-tuning the policy model on the improved responses, resulting in a policy model that no longer requires feedback at inference. Evaluation on the LaMP-QA benchmark that consists of three diverse domains demonstrates consistent and significant improvements over the state-of-the-art results. Human evaluations further confirm the superior quality of the generated responses. These results demonstrate that NLF provides more effective signals for optimizing personalized question answering.

large language model, machine learning, question answering, (18 more...)

arXiv.org Artificial Intelligence

2508.10695

Country:

Asia (1.00)
Europe (0.93)
North America > United States > Massachusetts (0.68)
North America > Canada (0.68)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

FinSage: A Multi-aspect RAG System for Financial Filings Question Answering

Wang, Xinyu, Chi, Jijun, Tai, Zhenghan, Kwok, Tung Sum Thomas, Li, Muzhi, Li, Zhuhong, He, Hailin, Hua, Yuchen, Lu, Peng, Wang, Suyuchen, Wu, Yihong, Huang, Jerry, Tian, Jingrui, Mo, Fengran, Cui, Yufei, Zhou, Ling

arXiv.org Artificial IntelligenceAug-15-2025

Leveraging large language models in real-world settings often entails a need to utilize domain-specific data and tools in order to follow the complex regulations that need to be followed for acceptable use. Within financial sectors, modern enterprises increasingly rely on Retrieval-Augmented Generation (RAG) systems to address complex compliance requirements in financial document workflows. However, existing solutions struggle to account for the inherent heterogeneity of data (e.g., text, tables, diagrams) and evolving nature of regulatory standards used in financial filings, leading to compromised accuracy in critical information extraction. We propose the FinSage framework as a solution, utilizing a multi-aspect RAG framework tailored for regulatory compliance analysis in multi-modal financial documents. FinSage introduces three innovative components: (1) a multi-modal pre-processing pipeline that unifies diverse data formats and generates chunk-level metadata summaries, (2) a multi-path sparse-dense retrieval system augmented with query expansion (HyDE) and metadata-aware semantic search, and (3) a domain-specialized re-ranking module fine-tuned via Direct Preference Optimization (DPO) to prioritize compliance-critical content. Extensive experiments demonstrate that FinSage achieves an impressive recall of 92.51% on 75 expert-curated questions derived from surpasses the best baseline method on the FinanceBench question answering datasets by 24.06% in accuracy. Moreover, FinSage has been successfully deployed as financial question-answering agent in online meetings, where it has already served more than 1,200 people.

large language model, machine learning, question answering, (19 more...)

arXiv.org Artificial Intelligence

2504.14493

Country:

North America > United States > California (0.28)
North America > Canada > Quebec (0.28)

Genre: Research Report (0.83)

Industry:

Law (1.00)
Banking & Finance (1.00)
Government (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

SQALER: Scaling Question Answering by Decoupling Multi-Hop and Logical Reasoning -- Appendix

Neural Information Processing SystemsAug-14-2025, 22:58:27 GMT

The knowledge seeking procedure described in Section 2.1 applies a search algorithm over the graph Each of such queries takes constant time. As mentioned in Section 2.3, the approach described in this paper can be used to answer any valid We proceed by induction on the number of literals |Q |. 3 Base case. For the experiments on KBQA, we assume that we only have access to pairs of questions and answers, i.e. the actual inferential chain leading from the question to the answer is latent. Therefore, we resort to weak supervision to train the model. Inspired by such insight, we employ a similar technique to enhance the performance of our model.

query, relation sequence, sequence, (16 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.41)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.34)

Add feedback

IP-CRR: Information Pursuit for Interpretable Classification of Chest Radiology Reports

Ge, Yuyan, Chan, Kwan Ho Ryan, Messina, Pablo, Vidal, René

arXiv.org Artificial IntelligenceAug-14-2025

The development of AI-based methods to analyze radiology reports could lead to significant advances in medical diagnosis, from improving diagnostic accuracy to enhancing efficiency and reducing workload. However, the lack of interpretability of AI-based methods could hinder their adoption in clinical settings. In this paper, we propose an interpretable-by-design framework for classifying chest radiology reports. First, we extract a set of representative facts from a large set of reports. Then, given a new report, we query whether a small subset of the representative facts is entailed by the report, and predict a diagnosis based on the selected subset of query-answer pairs. The explanation for a prediction is, by construction, the set of selected queries and answers. We use the Information Pursuit framework to select the most informative queries, a natural language inference model to determine if a fact is entailed by the report, and a classifier to predict the disease. Experiments on the MIMIC-CXR dataset demonstrate the effectiveness of the proposed method, highlighting its potential to enhance trust and usability in medical AI.

machine learning, natural language, question answering, (19 more...)

arXiv.org Artificial Intelligence

2505.00191

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.51)

Add feedback

Grounding Multilingual Multimodal LLMs With Cultural Knowledge

Nyandwi, Jean de Dieu, Song, Yueqi, Khanuja, Simran, Neubig, Graham

arXiv.org Artificial IntelligenceAug-13-2025

Multimodal Large Language Models excel in high-resource settings, but often misinterpret long-tail cultural entities and underperform in low-resource languages. To address this gap, we propose a data-centric approach that directly grounds MLLMs in cultural knowledge. Leveraging a large scale knowledge graph from Wikidata, we collect images that represent culturally significant entities, and generate synthetic multilingual visual question answering data. The resulting dataset, CulturalGround, comprises 22 million high-quality, culturally-rich VQA pairs spanning 42 countries and 39 languages. We train an open-source MLLM CulturalPangea on CulturalGround, interleaving standard multilingual instruction-tuning data to preserve general abilities. CulturalPangea achieves state-of-the-art performance among open models on various culture-focused multilingual multimodal benchmarks, outperforming prior models by an average of 5.0 without degrading results on mainstream vision-language tasks. Our findings show that our targeted, culturally grounded approach could substantially narrow the cultural gap in MLLMs and offer a practical path towards globally inclusive multimodal systems.

arxiv preprint arxiv, large language model, question answering, (17 more...)

arXiv.org Artificial Intelligence

2508.07414

Country:

Europe (1.00)
Africa (0.93)
Asia > Middle East (0.92)

Genre: Research Report > New Finding (0.86)

Industry:

Leisure & Entertainment (1.00)
Government (0.93)
Education (0.68)
Media > Music (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.66)

Add feedback

BharatBBQ: A Multilingual Bias Benchmark for Question Answering in the Indian Context

Tomar, Aditya, Sahoo, Nihar Ranjan, Bhattacharyya, Pushpak

arXiv.org Artificial IntelligenceAug-12-2025

Evaluating social biases in language models (LMs) is crucial for ensuring fairness and minimizing the reinforcement of harmful stereotypes in AI systems. Existing benchmarks, such as the Bias Benchmark for Question Answering (BBQ), primarily focus on Western contexts, limiting their applicability to the Indian context. To address this gap, we introduce BharatBBQ, a culturally adapted benchmark designed to assess biases in Hindi, English, Marathi, Bengali, Tamil, Telugu, Odia, and Assamese. BharatBBQ covers 13 social categories, including 3 intersectional groups, reflecting prevalent biases in the Indian sociocultural landscape. Our dataset contains 49,108 examples in one language that are expanded using translation and verification to 392,864 examples in eight different languages. We evaluate five multilingual LM families across zero and few-shot settings, analyzing their bias and stereotypical bias scores. Our findings highlight persistent biases across languages and social categories and often amplified biases in Indian languages compared to English, demonstrating the necessity of linguistically and culturally grounded benchmarks for bias evaluation.

large language model, natural language, question answering, (19 more...)

arXiv.org Artificial Intelligence

2508.0709

Country:

Europe (1.00)
Asia > India (0.29)
North America > Mexico (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.61)

Add feedback