DUMB: A Benchmark for Smart Evaluation of Dutch Models
de Vries, Wietse, Wieling, Martijn, Nissim, Malvina
–arXiv.org Artificial Intelligence
We introduce the Dutch Model Benchmark: DUMB. The benchmark includes a diverse set of datasets for low-, medium- and high-resource tasks. The total set of nine tasks includes four tasks that were previously not available in Dutch. Instead of relying on a mean score across tasks, we propose Relative Error Reduction (RER), which compares the DUMB performance of language models to a strong baseline which can be referred to in the future even when assessing different sets of language models. Through a comparison of 14 pre-trained language models (mono- and multi-lingual, of varying sizes), we assess the internal consistency of the benchmark tasks, as well as the factors that likely enable high performance. Our results indicate that current Dutch monolingual models under-perform and suggest training larger Dutch models with other architectures and pre-training objectives. At present, the highest performance is achieved by DeBERTaV3 (large), XLM-R (large) and mDeBERTaV3 (base). In addition to highlighting best strategies for training larger Dutch models, DUMB will foster further research on Dutch. A public leaderboard is available at https://dumbench.nl.
arXiv.org Artificial Intelligence
Oct-13-2023
- Country:
- Africa > Middle East
- Morocco (0.04)
- Asia
- China > Hong Kong (0.04)
- Middle East
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Republic of Türkiye > Istanbul Province
- Europe
- Romania > București - Ilfov Development Region
- Municipality of Bucharest > Bucharest (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Belgium
- Brussels-Capital Region > Brussels (0.04)
- Flanders > Flemish Brabant
- Leuven (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Sweden > Uppsala County
- Uppsala (0.04)
- Netherlands
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Iceland > Capital Region
- Reykjavik (0.04)
- Romania > București - Ilfov Development Region
- North America
- Canada > Quebec
- Montreal (0.04)
- Dominican Republic (0.04)
- United States
- Colorado > Boulder County
- Boulder (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New Jersey (0.04)
- New York (0.04)
- Texas (0.04)
- Washington > King County
- Seattle (0.14)
- Colorado > Boulder County
- Canada > Quebec
- Oceania > Australia
- Africa > Middle East
- Genre:
- Research Report > New Finding (0.88)
- Technology: