NoCoLA: The Norwegian Corpus of Linguistic Acceptability
Jentoft, Matias, Samuel, David
–arXiv.org Artificial Intelligence
While there has been a surge of large language models for Norwegian in recent years, we lack any tool to evaluate their understanding of grammaticality. We present two new Norwegian datasets for this task. NoCoLA_class is a supervised binary classification task where the goal is to discriminate between acceptable and non-acceptable sentences. On the other hand, NoCoLA_zero is a purely diagnostic task for evaluating the grammatical judgement of a language model in a completely zero-shot manner, i.e. without any further training. In this paper, we describe both datasets in detail, show how to use them for different flavors of language models, and conduct a comparative study of the existing Norwegian language models.
arXiv.org Artificial Intelligence
Jun-13-2023
- Country:
- North America
- Dominican Republic (0.04)
- United States
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Minnesota > Hennepin County
- Europe
- Sweden > Östergötland County
- Linköping (0.04)
- Norway > Eastern Norway
- Oslo (0.05)
- Iceland > Capital Region
- Reykjavik (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Sweden > Östergötland County
- Asia > Middle East
- UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- North America
- Genre:
- Research Report (0.64)
- Technology: