Hindi-BEIR : A Large Scale Retrieval Benchmark in Hindi

Acharya, Arkadeep, Murthy, Rudra, Kumar, Vishwajeet, Sen, Jaydeep

Aug-18-2024–arXiv.org Artificial Intelligence

Given the large number of Hindi speakers worldwide, there is a pressing need for robust and efficient information retrieval systems for Hindi. Despite ongoing research, there is a lack of comprehensive benchmark for evaluating retrieval models in Hindi. To address this gap, we introduce the Hindi version of the BEIR benchmark, which includes a subset of English BEIR datasets translated to Hindi, existing Hindi retrieval datasets, and synthetically created datasets for retrieval. The benchmark is comprised of $15$ datasets spanning across $8$ distinct tasks. We evaluate state-of-the-art multilingual retrieval models on this benchmark to identify task and domain-specific challenges and their impact on retrieval performance. By releasing this benchmark and a set of relevant baselines, we enable researchers to understand the limitations and capabilities of current Hindi retrieval models, promoting advancements in this critical area. The datasets from Hindi-BEIR are publicly available.

corpus, dataset, query, (12 more...)

arXiv.org Artificial Intelligence

Aug-18-2024

arXiv.org PDF

Add feedback

Country:
- North America > Dominican Republic (0.04)
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- Europe
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
- Asia
  - Middle East > Israel (0.04)
  - India
    - West Bengal > Kolkata (0.04)
    - Gujarat > Gandhinagar (0.04)
    - Karnataka > Bengaluru (0.04)
    - Telangana > Hyderabad (0.04)
    - NCT
      - New Delhi (0.04)
      - Delhi (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.88)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found