MTEB-NL and E5-NL: Embedding Benchmark and Models for Dutch
Banar, Nikolay, Lotfi, Ehsan, Van Nooten, Jens, Arhiliuc, Cristina, Kliocaite, Marija, Daelemans, Walter
–arXiv.org Artificial Intelligence
Recently, embedding resources, including models, benchmarks, and datasets, have been widely released to support a variety of languages. However, the Dutch language remains underrepresented, typically comprising only a small fraction of the published multilingual resources. To address this gap and encourage the further development of Dutch embeddings, we introduce new resources for their evaluation and generation. First, we introduce the Massive Text Embedding Benchmark for Dutch (MTEB-NL), which includes both existing Dutch datasets and newly created ones, covering a wide range of tasks. Second, we provide a training dataset compiled from available Dutch retrieval datasets, complemented with synthetic data generated by large language models to expand task coverage beyond retrieval. Finally, we release a series of E5-NL models compact yet efficient embedding models that demonstrate strong performance across multiple tasks. We make our resources publicly available through the Hugging Face Hub and the MTEB package.
arXiv.org Artificial Intelligence
Sep-17-2025
- Country:
- Asia
- Europe
- Belgium > Flanders
- Antwerp Province > Antwerp (0.04)
- Iceland > Capital Region
- Reykjavik (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Netherlands (0.28)
- Slovenia > Drava
- Municipality of Benedikt > Benedikt (0.04)
- Belgium > Flanders
- North America
- Canada > British Columbia
- Dominican Republic (0.04)
- United States
- Florida > Miami-Dade County
- Miami (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- New York > New York County
- New York City (0.04)
- Texas > Travis County
- Austin (0.04)
- Washington > King County
- Seattle (0.04)
- Florida > Miami-Dade County
- Oceania > Australia
- Genre:
- Research Report (0.82)
- Industry:
- Government (0.92)
- Health & Medicine (0.68)
- Technology: