Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration
Dai, Sunhao, Liu, Weihao, Zhou, Yuqi, Pang, Liang, Ruan, Rongju, Wang, Gang, Dong, Zhenhua, Xu, Jun, Wen, Ji-Rong
–arXiv.org Artificial Intelligence
The proliferation of Large Language Models (LLMs) has led to an influx of AI-generated content (AIGC) on the internet, transforming the corpus of Information Retrieval (IR) systems from solely human-written to a coexistence with LLM-generated content. The impact of this surge in AIGC on IR systems remains an open question, with the primary challenge being the lack of a dedicated benchmark for researchers. In this paper, we introduce Cocktail, a comprehensive benchmark tailored for evaluating IR models in this mixed-sourced data landscape of the LLM era. Cocktail consists of 16 diverse datasets with mixed human-written and LLM-generated corpora across various text retrieval tasks and domains. Additionally, to avoid the potential bias from previously included dataset information in LLMs, we also introduce an up-to-date dataset, named NQ-UTD, with queries derived from recent events. Through conducting over 1,000 experiments to assess state-of-the-art retrieval models against the benchmarked datasets in Cocktail, we uncover a clear trade-off between ranking performance and source bias in neural retrieval models, highlighting the necessity for a balanced approach in designing future IR systems. We hope Cocktail can serve as a foundational resource for IR research in the LLM era, with all data and code publicly available at \url{https://github.com/KID-22/Cocktail}.
arXiv.org Artificial Intelligence
Jul-2-2024
- Country:
- Asia > Middle East
- Qatar (0.28)
- Europe (0.67)
- North America > United States (1.00)
- Asia > Middle East
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Government (0.93)
- Health & Medicine > Therapeutic Area (1.00)
- Information Technology (1.00)
- Leisure & Entertainment > Sports (0.92)
- Media
- Film (1.00)
- Television (1.00)
- Technology: