Bridging Language Gaps with Adaptive RAG: Improving Indonesian Language Question Answering

Christian, William, Adamlu, Daniel, Yu, Adrian, Suhartono, Derwin

arXiv.org Artificial Intelligence 

Abstract--Question Answering (QA) has seen significant improvements with the advancement of machine learning models, further studies enhanced this question answering system by retrieving external information, called Retrieval-Augmented Generation (RAG) to produce more accurate and informative answers. However, these state-of-the-art-performance is predominantly in English language. T o address this gap we made an effort of bridging language gaps by incorporating Adaptive RAG system to Indonesian language. Adaptive RAG system integrates a classifier whose task is to distinguish the question complexity, which in turn determines the strategy for answering the question. T o overcome the limited availability of Indonesian language dataset, our study employs machine translation as data augmentation approach. Experiments show reliable question complexity classifier; however, we observed significant inconsistencies in multi-retrieval answering strategy which negatively impacted the overall evaluation when this strategy was applied. Recent Large Language Models (LLMs) have shown incredible performance for a lot of Natural Language tasks. However, despite the advancement of LLMs in all tasks in natural language processing, they still have problems answering questions that require a knowledge-intensive background, often resulting in hallucination answers [7]. LLMs often provide accurate answers when entities mentioned in the question are present in their training data. Furthermore, the performance of the models has a significant correlation with the entity popularity; less popular entities are often not answered accurately by LLMs [8]. Updating the LLM's knowledge frequently is not a good solution since the training of LLM with billions or even trillions of data from all over the internet takes too much time. In contrast, recent studies have demonstrated that augmenting non-parametric knowledge (information not contained in the model's training data) to the question-answering method commonly referred to as Retrieval Augmented Generation (RAG) [9], even smaller models outperform larger models in terms of parameters [10].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found