crag
- North America > United States (0.14)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Leisure & Entertainment (1.00)
- Media > Music (0.93)
CRAG - Comprehensive RAG Benchmark
Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search. CRAG is designed to encapsulate a diverse array of questions across five domains and eight question categories, reflecting varied entity popularity from popular to long-tail, and temporal dynamisms ranging from years to seconds. Our evaluation on this benchmark highlights the gap to fully trustworthy QA.
Supplemental Materials
We bear all responsibility in case of violation of rights, etc., and confirmation of the data license. This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International This license permits sharing and adapting the work provided it is not used for commercial purposes and appropriate credit is given. Please refer to Section 3 for our hosting plan. In this section, we use the framework of Datasheets for Datasets [? ] to form a datasheet for CRAG, For what purpose was the dataset created? Was there a specific task in mind?
- Information Technology (0.90)
- Law (0.70)
Supplemental Materials
We bear all responsibility in case of violation of rights, etc., and confirmation of the data license. This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International This license permits sharing and adapting the work provided it is not used for commercial purposes and appropriate credit is given. Please refer to Section 3 for our hosting plan. In this section, we use the framework of Datasheets for Datasets [? ] to form a datasheet for CRAG, For what purpose was the dataset created? Was there a specific task in mind?
- Information Technology (0.90)
- Law (0.70)
- North America > United States (0.14)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Leisure & Entertainment (1.00)
- Media > Music (0.93)
- Media > Film (0.68)
CRAG - Comprehensive RAG Benchmark
Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search. CRAG is designed to encapsulate a diverse array of questions across five domains and eight question categories, reflecting varied entity popularity from popular to long-tail, and temporal dynamisms ranging from years to seconds. Our evaluation on this benchmark highlights the gap to fully trustworthy QA.
CRAG -- Comprehensive RAG Benchmark
Yang, Xiao, Sun, Kai, Xin, Hao, Sun, Yushi, Bhalla, Nikita, Chen, Xiangsen, Choudhary, Sajal, Gui, Rongze Daniel, Jiang, Ziran Will, Jiang, Ziyu, Kong, Lingkun, Moran, Brian, Wang, Jiaqi, Xu, Yifan Ethan, Yan, An, Yang, Chenyu, Yuan, Eting, Zha, Hanwen, Tang, Nan, Chen, Lei, Scheffer, Nicolas, Liu, Yue, Shah, Nirav, Wanga, Rakesh, Kumar, Anuj, Yih, Wen-tau, Dong, Xin Luna
Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search. CRAG is designed to encapsulate a diverse array of questions across five domains and eight question categories, reflecting varied entity popularity from popular to long-tail, and temporal dynamisms ranging from years to seconds. Our evaluation on this benchmark highlights the gap to fully trustworthy QA. Whereas most advanced LLMs achieve <=34% accuracy on CRAG, adding RAG in a straightforward manner improves the accuracy only to 44%. State-of-the-art industry RAG solutions only answer 63% questions without any hallucination. CRAG also reveals much lower accuracy in answering questions regarding facts with higher dynamism, lower popularity, or higher complexity, suggesting future research directions. The CRAG benchmark laid the groundwork for a KDD Cup 2024 challenge, attracting thousands of participants and submissions within the first 50 days of the competition. We commit to maintaining CRAG to serve research communities in advancing RAG solutions and general QA solutions.
- Leisure & Entertainment (1.00)
- Media > Music (0.68)
- Media > Film (0.68)
Clustered Retrieved Augmented Generation (CRAG)
Akesson, Simon, Santos, Frances A.
Providing external knowledge to Large Language Models (LLMs) is a key point for using these models in real-world applications for several reasons, such as incorporating up-to-date content in a real-time manner, providing access to domain-specific knowledge, and contributing to hallucination prevention. The vector database-based Retrieval Augmented Generation (RAG) approach has been widely adopted to this end. Thus, any part of external knowledge can be retrieved and provided to some LLM as the input context. Despite RAG approach's success, it still might be unfeasible for some applications, because the context retrieved can demand a longer context window than the size supported by LLM. Even when the context retrieved fits into the context window size, the number of tokens might be expressive and, consequently, impact costs and processing time, becoming impractical for most applications. To address these, we propose CRAG, a novel approach able to effectively reduce the number of prompting tokens without degrading the quality of the response generated compared to a solution using RAG. Through our experiments, we show that CRAG can reduce the number of tokens by at least 46\%, achieving more than 90\% in some cases, compared to RAG. Moreover, the number of tokens with CRAG does not increase considerably when the number of reviews analyzed is higher, unlike RAG, where the number of tokens is almost 9x higher when there are 75 reviews compared to 4 reviews.
- Europe > France (0.05)
- South America > Brazil > São Paulo > Campinas (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Asia > Kazakhstan > Akmola Region > Astana (0.04)
Corrective Retrieval Augmented Generation
Yan, Shi-Qi, Gu, Jia-Chen, Zhu, Yun, Ling, Zhen-Hua
Large language models (LLMs) inevitably exhibit hallucinations since the accuracy of generated texts cannot be secured solely by the parametric knowledge they encapsulate. Although retrieval-augmented generation (RAG) is a practicable complement to LLMs, it relies heavily on the relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong. To this end, we propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation. Specifically, a lightweight retrieval evaluator is designed to assess the overall quality of retrieved documents for a query, returning a confidence degree based on which different knowledge retrieval actions can be triggered. Since retrieval from static and limited corpora can only return sub-optimal documents, large-scale web searches are utilized as an extension for augmenting the retrieval results. Besides, a decompose-then-recompose algorithm is designed for retrieved documents to selectively focus on key information and filter out irrelevant information in them. CRAG is plug-and-play and can be seamlessly coupled with various RAG-based approaches. Experiments on four datasets covering short- and long-form generation tasks show that CRAG can significantly improve the performance of RAG-based approaches.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Poland > Podlaskie Province (0.14)
- Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.05)
- (10 more...)
- Leisure & Entertainment > Sports > Tennis (0.46)
- Government (0.46)
Reading The Game: Shadow Of Mordor
For years now, some of the best, wildest, most moving or revealing stories we've been telling ourselves have come not from books, movies or TV, but from video games. So we're running an occasional series, Reading The Game, in which we take a look at some of these games from a literary perspective. They march and they argue. They taunt their human slaves and, when they pass close enough, I can hear them talking about me -- Talion, called Gravewalker, murdered Captain of Gondor brought back to life by magic and the influence of my mostly-invisible elf/wraith buddy, Celebrimbor, who is a ghost that lives in my head. I am bored out of my elf-inhabited mind.