Goto

Collaborating Authors

 question-answering


A Benchmark Dataset with Larger Context for Non-Factoid Question Answering over Islamic Text

arXiv.org Artificial Intelligence

Accessing and comprehending religious texts, particularly the Quran (the sacred scripture of Islam) and Ahadith (the corpus of the sayings or traditions of the Prophet Muhammad), in today's digital era necessitates efficient and accurate Question-Answering (QA) systems. Yet, the scarcity of QA systems tailored specifically to the detailed nature of inquiries about the Quranic Tafsir (explanation, interpretation, context of Quran for clarity) and Ahadith poses significant challenges. To address this gap, we introduce a comprehensive dataset meticulously crafted for QA purposes within the domain of Quranic Tafsir and Ahadith. This dataset comprises a robust collection of over 73,000 question-answer pairs, standing as the largest reported dataset in this specialized domain. Importantly, both questions and answers within the dataset are meticulously enriched with contextual information, serving as invaluable resources for training and evaluating tailored QA systems. However, while this paper highlights the dataset's contributions and establishes a benchmark for evaluating QA performance in the Quran and Ahadith domains, our subsequent human evaluation uncovered critical insights regarding the limitations of existing automatic evaluation techniques. The discrepancy between automatic evaluation metrics, such as ROUGE scores, and human assessments became apparent. The human evaluation indicated significant disparities: the model's verdict consistency with expert scholars ranged between 11% to 20%, while its contextual understanding spanned a broader spectrum of 50% to 90%. These findings underscore the necessity for evaluation techniques that capture the nuances and complexities inherent in understanding religious texts, surpassing the limitations of traditional automatic metrics.


Experimenting with Legal AI Solutions: The Case of Question-Answering for Access to Justice

arXiv.org Artificial Intelligence

Generative AI models, such as the GPT and Llama series, have significant potential to assist laypeople in answering legal questions. However, little prior work focuses on the data sourcing, inference, and evaluation of these models in the context of laypersons. To this end, we propose a human-centric legal NLP pipeline, covering data sourcing, inference, and evaluation. We introduce and release a dataset, LegalQA, with real and specific legal questions spanning from employment law to criminal law, corresponding answers written by legal experts, and citations for each answer. We develop an automatic evaluation protocol for this dataset, then show that retrieval-augmented generation from only 850 citations in the train set can match or outperform internet-wide retrieval, despite containing 9 orders of magnitude less data. Finally, we propose future directions for open-sourced efforts, which fall behind closed-sourced models.


RAG based Question-Answering for Contextual Response Prediction System

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have shown versatility in various Natural Language Processing (NLP) tasks, including their potential as effective question-answering systems. However, to provide precise and relevant information in response to specific customer queries in industry settings, LLMs require access to a comprehensive knowledge base to avoid hallucinations. Retrieval Augmented Generation (RAG) emerges as a promising technique to address this challenge. Yet, developing an accurate question-answering framework for real-world applications using RAG entails several challenges: 1) data availability issues, 2) evaluating the quality of generated content, and 3) the costly nature of human evaluation. In this paper, we introduce an end-to-end framework that employs LLMs with RAG capabilities for industry use cases. Given a customer query, the proposed system retrieves relevant knowledge documents and leverages them, along with previous chat history, to generate response suggestions for customer service agents in the contact centers of a major retail company. Through comprehensive automated and human evaluations, we show that this solution outperforms the current BERT-based algorithms in accuracy and relevance. Our findings suggest that RAG-based LLMs can be an excellent support to human customer service representatives by lightening their workload.


Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation

arXiv.org Artificial Intelligence

We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books. Previous efforts to construct such datasets relied on crowd-sourcing [1], but the emergence of transformers with a context size of 1 million or more tokens [2] now enables entirely automatic approaches. Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text, such as questions involving character arcs, broader themes, or the consequences of early actions later in the story. We propose a holistic pipeline for automatic data generation including question generation, answering, and model scoring using an "Evaluator". We find that a relative approach, comparing answers between models in a pairwise fashion and ranking with a Bradley-Terry model, provides a more consistent and differentiating scoring mechanism than an absolute scorer that rates answers individually. We also show that LLMs from different model families produce moderate agreement in their ratings. We ground our approach using the manually curated NarrativeQA dataset, where our evaluator shows excellent agreement with human judgement and even finds errors in the dataset. Using our automatic evaluation approach, we show that using an entire book as context produces superior reading comprehension performance compared to baseline no-context (parametric knowledge only) and retrieval-based approaches.


MM-PhyRLHF: Reinforcement Learning Framework for Multimodal Physics Question-Answering

arXiv.org Artificial Intelligence

Recent advancements in LLMs have shown their significant potential in tasks like text summarization and generation. Yet, they often encounter difficulty while solving complex physics problems that require arithmetic calculation and a good understanding of concepts. Moreover, many physics problems include images that contain important details required to understand the problem's context. We propose an LMM-based chatbot to answer multimodal physics MCQs. For domain adaptation, we utilize the MM-PhyQA dataset comprising Indian high school-level multimodal physics problems. To improve the LMM's performance, we experiment with two techniques, RLHF (Reinforcement Learning from Human Feedback) and Image Captioning. In image captioning, we add a detailed explanation of the diagram in each image, minimizing hallucinations and image processing errors. We further explore the integration of Reinforcement Learning from Human Feedback (RLHF) methodology inspired by the ranking approach in RLHF to enhance the human-like problem-solving abilities of the models. The RLHF approach incorporates human feedback into the learning process of LLMs, improving the model's problem-solving skills, truthfulness, and reasoning capabilities, minimizing the hallucinations in the answers, and improving the quality instead of using vanilla-supervised fine-tuned models. We employ the LLaVA open-source model to answer multimodal physics MCQs and compare the performance with and without using RLHF.


Question-Answering Based Summarization of Electronic Health Records using Retrieval Augmented Generation

arXiv.org Artificial Intelligence

Summarization of electronic health records (EHRs) can substantially minimize 'screen time' for both patients as well as medical personnel. In recent years summarization of EHRs have employed machine learning pipelines using state of the art neural models. However, these models have produced less than adequate results that are attributed to the difficulty of obtaining sufficient annotated data for training. Moreover, the requirement to consider the entire content of an EHR in summarization has resulted in poor performance due to the fact that attention mechanisms in modern large language models (LLMs) adds a quadratic complexity in terms of the size of the input. We propose here a method that mitigates these shortcomings by combining semantic search, retrieval augmented generation (RAG) and question-answering using the latest LLMs. In our approach summarization is the extraction of answers to specific questions that are deemed important by subject-matter experts (SMEs). Our approach is quite efficient; requires minimal to no training; does not suffer from the 'hallucination' problem of LLMs; and it ensures diversity, since the summary will not have repeated content but diverse answers to specific questions.


Framework for Question-Answering in Sanskrit through Automated Construction of Knowledge Graphs

arXiv.org Artificial Intelligence

Sanskrit (sa\d{m}sk\d{r}ta) enjoys one of the largest and most varied literature in the whole world. Extracting the knowledge from it, however, is a challenging task due to multiple reasons including complexity of the language and paucity of standard natural language processing tools. In this paper, we target the problem of building knowledge graphs for particular types of relationships from sa\d{m}sk\d{r}ta texts. We build a natural language question-answering system in sa\d{m}sk\d{r}ta that uses the knowledge graph to answer factoid questions. We design a framework for the overall system and implement two separate instances of the system on human relationships from mah\=abh\=arata and r\=am\=aya\d{n}a, and one instance on synonymous relationships from bh\=avaprak\=a\'sa nigha\d{n}\d{t}u, a technical text from \=ayurveda. We show that about 50% of the factoid questions can be answered correctly by the system. More importantly, we analyse the shortcomings of the system in detail for each step, and discuss the possible ways forward.


Question-Answering on Textbooks by Searching and Ranking

#artificialintelligence

Question Answering is a popular application of NLP. Transformer models trained on big datasets have dramatically improved the state-of-the-art results on Question Answering. The question answering task can be formulated in many ways. The most common application is an extractive question answering on a small context. The SQuAD dataset is a popular dataset where given a passage and a question, the model selects the word(s) representing the answer.


Fine-tune the Entire RAG Architecture (including DPR retriever) for Question-Answering

arXiv.org Artificial Intelligence

In September 2020, Facebook open-sourced a new NLP model called Retrieval Augmented Generation (RAG) on the Hugging Face Transformer library. RAG is capable to use a set of support documents from an external knowledge base as a latent variable to generate the final output. The RAG model consists of an Input Encoder, a Neural Retriever, and an Output Generator. All three components are initialized with pre-trained transformers. However, the original Hugging Face implementation only allowed fine-tuning the Input Encoder and the Output Generator in an end-toend manner, while the Neural Retriever needs to be trained seperately. To the best of our knowledge, an end-to-end RAG implementation that trains all three components does not exist.


Intent Classification in Question-Answering Using LSTM Architectures

arXiv.org Machine Learning

Question-answering (QA) is certainly the best known and probably also one of the most complex problem within Natural Language Processing (NLP) and artificial intelligence (AI). Since the complete solution to the problem of finding a generic answer still seems far away, the wisest thing to do is to break down the problem by solving single simpler parts. Assuming a modular approach to the problem, we confine our research to intent classification for an answer, given a question. Through the use of an LSTM network, we show how this type of classification can be approached effectively and efficiently, and how it can be properly used within a basic prototype responder. Keywords: Deep Learning, LSTM, Intent classification, Question-Answering 1 Introduction Despite the remarkable results obtained in the different areas of Natural Language Processing, the solution to the Question-Answering problem, in its general sense, still seems far away [1].