Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval

Yang, Jinrui, Baldwin, Timothy, Cohn, Trevor

Nov-3-2023–arXiv.org Artificial Intelligence

We present Multi-EuP, a new multilingual benchmark dataset, comprising 22K multi-lingual documents collected from the European Parliament, spanning 24 languages. This dataset is designed to investigate fairness in a multilingual information retrieval (IR) context to analyze both language and demographic bias in a ranking context. It boasts an authentic multilingual corpus, featuring topics translated into all 24 languages, as well as cross-lingual relevance judgments. Furthermore, it offers rich demographic information associated with its documents, facilitating the study of demographic bias. We report the effectiveness of Multi-EuP for benchmarking both monolingual and multilingual IR. We also conduct a preliminary experiment on language bias caused by the choice of tokenization strategy.

dataset, query, retrieval, (15 more...)

arXiv.org Artificial Intelligence

Nov-3-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States > New York
    - New York County > New York City (0.04)
- Europe
  - Belgium (0.05)
  - Ireland (0.04)
  - Bulgaria (0.04)
  - Poland (0.04)
  - Germany (0.04)
  - Netherlands (0.04)
  - Denmark (0.04)
  - Finland (0.04)
  - Slovakia (0.04)
  - Slovenia (0.04)
  - France (0.04)
  - Italy (0.04)
  - Greece (0.04)
  - Latvia (0.04)
  - Lithuania (0.04)
  - Estonia (0.04)
  - Romania (0.04)
  - Croatia (0.04)
  - Sweden (0.04)
  - Czechia (0.04)
  - United Kingdom (0.04)
  - Hungary (0.04)
  - Spain > Valencian Community
    - Valencia Province > Valencia (0.04)
    - Alicante Province > Alicante (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Middle East
    - Malta (0.04)
    - Cyprus (0.04)
- Asia
  - Thailand > Phuket
    - Phuket (0.04)
  - Middle East
    - UAE (0.04)
    - Israel (0.04)

Genre:
- Research Report (1.00)

Industry:
- Government > Regional Government (0.46)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.73)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found