AITopics | Question Answering

Collaborating Authors

Question Answering

"Questions are asked and answered every day. Question answering (QA) technology aims to deliver the same facility online. It goes further than the more familiar search based on keywords (as in Google, Yahoo, and other search engines), in attempting to recognize what a question expresses and to respond with an actual answer. This simplifies things for users in two ways. First, questions do not often translate into a simple list of keywords. ...Second, QA takes responsibility for providing answers, rather than a searchable list of links to potentially relevant documents (web pages), highlighted by snippets of text that show how the query matched the documents."
– from Bonnie Webber & Nick Webb. Question Answering. In The Handbook of Computational Linguistics and Natural Language Processing. Alexander Clark, Chris Fox, Shalom Lappin (Eds.). Wiley, 2010.

News Overviews Instructional Materials AI-Alerts Classics

MULTITAT: Benchmarking Multilingual Table-and-Text Question Answering

Zhang, Xuanliang, Wang, Dingzirui, Xu, Keyan, Zhu, Qingfu, Che, Wanxiang

arXiv.org Artificial IntelligenceFeb-24-2025

Question answering on the hybrid context of tables and text (TATQA) is a critical task, with broad applications in data-intensive domains. However, existing TATQA datasets are limited to English, leading to several drawbacks: (i) They overlook the challenges of multilingual TAT-QA and cannot assess model performance in the multilingual setting. (ii) They do not reflect real-world scenarios where tables and texts frequently appear in non-English languages. To address the limitations, we propose the first multilingual TATQA dataset (MULTITAT). Specifically, we sample data from 3 mainstream TATQA datasets and translate it into 10 diverse languages. To align the model TATQA capabilities in English with other languages, we develop a baseline, Ours. Experimental results reveal that the performance on non-English data in MULTITAT drops by an average of 19.4% compared to English, proving the necessity of MULTITAT. We further analyze the reasons for this performance gap. Furthermore, Ours outperforms other baselines by an average of 3.3, demonstrating its effectiveness.

large language model, machine learning, question answering, (22 more...)

arXiv.org Artificial Intelligence

2502.17253

Country:

Asia > Thailand > Bangkok > Bangkok (0.05)
North America > Dominican Republic (0.04)
North America > Canada > Ontario > Toronto (0.04)
(10 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.62)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction

Li, Tianpeng, Liu, Jun, Zhang, Tao, Fang, Yuanbo, Pan, Da, Wang, Mingrui, Liang, Zheng, Li, Zehuan, Lin, Mingan, Dong, Guosheng, Xu, Jianhua, Sun, Haoze, Zhou, Zenan, Chen, Weipeng

arXiv.org Artificial IntelligenceFeb-24-2025

We introduce Baichuan-Audio, an end-to-end audio large language model that seamlessly integrates audio understanding and generation. It features a text-guided aligned speech generation mechanism, enabling real-time speech interaction with both comprehension and generation capabilities. Baichuan-Audio leverages a pre-trained ASR model, followed by multi-codebook discretization of speech at a frame rate of 12.5 Hz. This multi-codebook setup ensures that speech tokens retain both semantic and acoustic information. To further enhance modeling, an independent audio head is employed to process audio tokens, effectively capturing their unique characteristics. To mitigate the loss of intelligence during pre-training and preserve the original capabilities of the LLM, we propose a two-stage pre-training strategy that maintains language understanding while enhancing audio modeling. Following alignment, the model excels in real-time speech-based conversation and exhibits outstanding question-answering capabilities, demonstrating its versatility and efficiency. The proposed model demonstrates superior performance in real-time spoken dialogue and exhibits strong question-answering abilities. Our code, model and training data are available at https://github.com/baichuan-inc/Baichuan-Audio

arxiv preprint arxiv, baichuan-audio, language model, (13 more...)

arXiv.org Artificial Intelligence

2502.17239

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

MultiOCR-QA: Dataset for Evaluating Robustness of LLMs in Question Answering on Multilingual OCR Texts

Piryani, Bhawna, Mozafari, Jamshid, Abdallah, Abdelrahman, Doucet, Antoine, Jatowt, Adam

arXiv.org Artificial IntelligenceFeb-23-2025

Optical Character Recognition (OCR) plays a crucial role in digitizing historical and multilingual documents, yet OCR errors -- imperfect extraction of the text, including character insertion, deletion and permutation -- can significantly impact downstream tasks like question-answering (QA). In this work, we introduce a multilingual QA dataset MultiOCR-QA, designed to analyze the effects of OCR noise on QA systems' performance. The MultiOCR-QA dataset comprises 60K question-answer pairs covering three languages, English, French, and German. The dataset is curated from OCR-ed old documents, allowing for the evaluation of OCR-induced challenges on question answering. We evaluate MultiOCR-QA on various levels and types of OCR errors to access the robustness of LLMs in handling real-world digitization errors. Our findings show that QA systems are highly prone to OCR induced errors and exhibit performance degradation on noisy OCR text.

dataset, multiocr-qa, ocr error, (13 more...)

arXiv.org Artificial Intelligence

2502.16781

Country:

Europe > Austria > Tyrol > Innsbruck (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
(12 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
(2 more...)

Add feedback

Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines

Long, Xinwei, Ma, Zhiyuan, Hua, Ermo, Zhang, Kaiyan, Qi, Biqing, Zhou, Bowen

arXiv.org Artificial IntelligenceFeb-23-2025

Retrieval-augmented generation (RAG) has emerged to address the knowledge-intensive visual question answering (VQA) task. Current methods mainly employ separate retrieval and generation modules to acquire external knowledge and generate answers, respectively. We propose ReAuSE, an alternative to the previous RAG model for the knowledge-based VQA task, which seamlessly integrates knowledge retriever into the generative multi-modal large language model, serving as a built-in search engine. Specifically, our model functions both as a generative retriever and an accurate answer generator. It not only helps retrieve documents from the knowledge base by producing identifiers for each document, but it also answers visual questions based on the retrieved documents. Furthermore, we propose a reinforced retrieval calibration module from relevance feedback to improve retrieval performance and align with the preferences for accurate answer generation. Extensive experiments on two representative OKVQA and A-OKVQA datasets demonstrate significant improvements ranging from 2.9\% to 9.6\% across all evaluation metrics when compared to strong baselines.

arxiv preprint arxiv, identifier, knowledge base, (14 more...)

arXiv.org Artificial Intelligence

2502.16641

Country:

Africa > Eswatini > Manzini > Manzini (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.95)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.72)
(2 more...)

Add feedback

Chats-Grid: An Iterative Retrieval Q&A Optimization Scheme Leveraging Large Model and Retrieval Enhancement Generation in smart grid

Li, Yunfeng, Zhang, Jiqun, Liao, Guofu, Shi, Xue, Liu, Junhong

arXiv.org Artificial IntelligenceFeb-21-2025

With rapid advancements in artificial intelligence, question-answering (Q&A) systems have become essential in intelligent search engines, virtual assistants, and customer service platforms. However, in dynamic domains like smart grids, conventional retrieval-augmented generation(RAG) Q&A systems face challenges such as inadequate retrieval quality, irrelevant responses, and inefficiencies in handling large-scale, real-time data streams. This paper proposes an optimized iterative retrieval-based Q&A framework called Chats-Grid tailored for smart grid environments. In the pre-retrieval phase, Chats-Grid advanced query expansion ensures comprehensive coverage of diverse data sources, including sensor readings, meter records, and control system parameters. During retrieval, Best Matching 25(BM25) sparse retrieval and BAAI General Embedding(BGE) dense retrieval in Chats-Grid are combined to process vast, heterogeneous datasets effectively. Post-retrieval, a fine-tuned large language model uses prompt engineering to assess relevance, filter irrelevant results, and reorder documents based on contextual accuracy. The model further generates precise, context-aware answers, adhering to quality criteria and employing a self-checking mechanism for enhanced reliability. Experimental results demonstrate Chats-Grid's superiority over state-of-the-art methods in fidelity, contextual recall, relevance, and accuracy by 2.37%, 2.19%, and 3.58% respectively. This framework advances smart grid management by improving decision-making and user interactions, fostering resilient and adaptive smart grid infrastructures.

accuracy, language model, retrieval, (15 more...)

arXiv.org Artificial Intelligence

2502.15583

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > China > Hong Kong (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.87)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MHQA: A Diverse, Knowledge Intensive Mental Health Question Answering Challenge for Language Models

Racha, Suraj, Joshi, Prashant, Raman, Anshika, Jangid, Nikita, Sharma, Mridul, Ramakrishnan, Ganesh, Punjabi, Nirmal

arXiv.org Artificial IntelligenceFeb-21-2025

Mental health remains a challenging problem all over the world, with issues like depression, anxiety becoming increasingly common. Large Language Models (LLMs) have seen a vast application in healthcare, specifically in answering medical questions. However, there is a lack of standard benchmarking datasets for question answering (QA) in mental health. Our work presents a novel multiple choice dataset, MHQA (Mental Health Question Answering), for benchmarking Language models (LMs). Previous mental health datasets have focused primarily on text classification into specific labels or disorders. MHQA, on the other hand, presents question-answering for mental health focused on four key domains: anxiety, depression, trauma, and obsessive/compulsive issues, with diverse question types, namely, factoid, diagnostic, prognostic, and preventive. We use PubMed abstracts as the primary source for QA. We develop a rigorous pipeline for LLM-based identification of information from abstracts based on various selection criteria and converting it into QA pairs. Further, valid QA pairs are extracted based on post-hoc validation criteria. Overall, our MHQA dataset consists of 2,475 expert-verified gold standard instances called MHQA-gold and ~56.1k pairs pseudo labeled using external medical references. We report F1 scores on different LLMs along with few-shot and supervised fine-tuning experiments, further discussing the insights for the scores.

dataset, language model, mental health, (13 more...)

arXiv.org Artificial Intelligence

2502.15418

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Minnesota (0.04)
Europe > United Kingdom > England > Leicestershire > Leicester (0.04)
(4 more...)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework

Yang, Yuming, Zhong, Jiang, Jin, Li, Huang, Jingwang, Gao, Jingpeng, Liu, Qing, Bai, Yang, Zhang, Jingyuan, Jiang, Rui, Wei, Kaiwen

arXiv.org Artificial IntelligenceFeb-20-2025

Multimodal Retrieval-Augmented Generation (MRAG) enhances reasoning capabilities by integrating external knowledge. However, existing benchmarks primarily focus on simple image-text interactions, overlooking complex visual formats like charts that are prevalent in real-world applications. In this work, we introduce a novel task, Chart-based MRAG, to address this limitation. To semi-automatically generate high-quality evaluation samples, we propose CHARt-based document question-answering GEneration (CHARGE), a framework that produces evaluation data through structured keypoint extraction, crossmodal verification, and keypoint-based generation. By combining CHARGE with expert validation, we construct Chart-MRAG Bench, a comprehensive benchmark for chart-based MRAG evaluation, featuring 4,738 question-answering pairs across 8 domains from real-world documents. Our evaluation reveals three critical limitations in current approaches: (1) unified multimodal embedding retrieval methods struggles in chart-based scenarios, (2) even with ground-truth retrieval, state-of-the-art MLLMs achieve only 58.19% Correctness and 73.87% Coverage scores, and (3) MLLMs demonstrate consistent text-over-visual modality bias during Chart-based MRAG reasoning. The CHARGE and Chart-MRAG Bench are released at https://github.com/Nomothings/CHARGE.git.

arxiv preprint arxiv, keypoint, retrieval, (13 more...)

arXiv.org Artificial Intelligence

2502.14864

Country:

Asia > China > Chongqing Province > Chongqing (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Europe (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Mitigating Lost-in-Retrieval Problems in Retrieval Augmented Multi-Hop Question Answering

Zhu, Rongzhi, Liu, Xiangyu, Sun, Zequn, Wang, Yiwei, Hu, Wei

arXiv.org Artificial IntelligenceFeb-19-2025

In this paper, we identify a critical problem, "lost-in-retrieval", in retrieval-augmented multi-hop question answering (QA): the key entities are missed in LLMs' sub-question decomposition. "Lost-in-retrieval" significantly degrades the retrieval performance, which disrupts the reasoning chain and leads to the incorrect answers. To resolve this problem, we propose a progressive retrieval and rewriting method, namely ChainRAG, which sequentially handles each sub-question by completing missing key entities and retrieving relevant sentences from a sentence graph for answer generation. Each step in our retrieval and rewriting process builds upon the previous one, creating a seamless chain that leads to accurate retrieval and answers. Finally, all retrieved sentences and sub-question answers are integrated to generate a comprehensive answer to the original question. We evaluate ChainRAG on three multi-hop QA datasets$\unicode{x2013}$MuSiQue, 2Wiki, and HotpotQA$\unicode{x2013}$using three large language models: GPT4o-mini, Qwen2.5-72B, and GLM-4-Plus. Empirical results demonstrate that ChainRAG consistently outperforms baselines in both effectiveness and efficiency.

chainrag, dataset, retrieval, (15 more...)

arXiv.org Artificial Intelligence

2502.14245

Country:

Asia > Vietnam > Da Nang > Da Nang (0.05)
Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering

Jurayj, William, Cheng, Jeffrey, Van Durme, Benjamin

arXiv.org Artificial IntelligenceFeb-19-2025

Scaling the test-time compute of large language models has demonstrated impressive performance on reasoning benchmarks. However, existing evaluations of test-time scaling make the strong assumption that a reasoning system should always give an answer to any question provided. This overlooks concerns about whether a model is confident in its answer, and whether it is appropriate to always provide a response. To address these concerns, we extract confidence scores during reasoning for thresholding model responses. We find that increasing compute budget at inference time not only helps models answer more questions correctly, but also increases confidence in correct responses. We then extend the current paradigm of zero-risk responses during evaluation by considering settings with non-zero levels of response risk, and suggest a recipe for reporting evaluations under these settings.

compute budget, confidence threshold, threshold, (12 more...)

arXiv.org Artificial Intelligence

2502.13962

Country:

Asia > Middle East > Jordan (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.52)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.43)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.34)

Add feedback

TrustRAG: An Information Assistant with Retrieval Augmented Generation

Fan, Yixing, Yan, Qiang, Wang, Wenshan, Guo, Jiafeng, Zhang, Ruqing, Cheng, Xueqi

arXiv.org Artificial IntelligenceFeb-19-2025

\Ac{RAG} has emerged as a crucial technique for enhancing large models with real-time and domain-specific knowledge. While numerous improvements and open-source tools have been proposed to refine the \ac{RAG} framework for accuracy, relatively little attention has been given to improving the trustworthiness of generated results. To address this gap, we introduce TrustRAG, a novel framework that enhances \ac{RAG} from three perspectives: indexing, retrieval, and generation. Specifically, in the indexing stage, we propose a semantic-enhanced chunking strategy that incorporates hierarchical indexing to supplement each chunk with contextual information, ensuring semantic completeness. In the retrieval stage, we introduce a utility-based filtering mechanism to identify high-quality information, supporting answer generation while reducing input length. In the generation stage, we propose fine-grained citation enhancement, which detects opinion-bearing sentences in responses and infers citation relationships at the sentence-level, thereby improving citation accuracy. We open-source the TrustRAG framework and provide a demonstration studio designed for excerpt-based question answering tasks \footnote{https://huggingface.co/spaces/golaxy/TrustRAG}. Based on these, we aim to help researchers: 1) systematically enhancing the trustworthiness of \ac{RAG} systems and (2) developing their own \ac{RAG} systems with more reliable outputs.

arxiv preprint arxiv, information, trustrag, (14 more...)

arXiv.org Artificial Intelligence

2502.13719

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Overview (0.69)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.55)
(3 more...)

Add feedback