AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark
Chen, Jianlyu, Wang, Nan, Li, Chaofan, Wang, Bo, Xiao, Shitao, Xiao, Han, Liao, Hao, Lian, Defu, Liu, Zheng
–arXiv.org Artificial Intelligence
Evaluation plays a crucial role in the advancement of information retrieval (IR) models. However, current benchmarks, which are based on predefined domains and human-labeled data, face limitations in addressing evaluation needs for emerging domains both cost-effectively and efficiently. To address this challenge, we propose the Automated Heterogeneous Information Retrieval Benchmark (AIR-Bench). AIR-Bench is distinguished by three key features: 1) Automated. The testing data in AIR-Bench is automatically generated by large language models (LLMs) without human intervention. 2) Heterogeneous. The testing data in AIR-Bench is generated with respect to diverse tasks, domains and languages. 3) Dynamic. The domains and languages covered by AIR-Bench are constantly augmented to provide an increasingly comprehensive evaluation benchmark for community developers. We develop a reliable and robust data generation pipeline to automatically create diverse and high-quality evaluation datasets based on real-world corpora. Our findings demonstrate that the generated testing data in AIR-Bench aligns well with human-labeled testing data, making AIR-Bench a dependable benchmark for evaluating IR models. The resources in AIR-Bench are publicly available at https://github.com/AIR-Bench/AIR-Bench.
arXiv.org Artificial Intelligence
Dec-20-2024
- Country:
- Asia (1.00)
- Europe (0.67)
- North America > United States (0.93)
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Health & Medicine
- Consumer Health (0.92)
- Diagnostic Medicine (0.93)
- Pharmaceuticals & Biotechnology (1.00)
- Therapeutic Area
- Cardiology/Vascular Diseases (1.00)
- Endocrinology (0.67)
- Gastroenterology (1.00)
- Hematology (0.67)
- Infections and Infectious Diseases (0.67)
- Musculoskeletal (1.00)
- Neurology > Headaches (0.67)
- Oncology > Lymphoma (0.46)
- Information Technology (1.00)
- Health & Medicine
- Technology: