BLiSS 1.0: Evaluating Bilingual Learner Competence in Second Language Small Language Models
Gao, Yuan, Salhan, Suchir, Caines, Andrew, Buttery, Paula, Sun, Weiwei
–arXiv.org Artificial Intelligence
To bridge the gap between performance-oriented benchmarks and the evaluation of cognitively inspired models, we introduce BLiSS 1.0, a Benchmark of Learner Interlingual Syntactic Structure. Our benchmark operationalizes a new paradigm of selective tolerance, testing whether a model finds a naturalistic learner error more plausible than a matched, artificial error within the same sentence. Constructed from over 2.8 million naturalistic learner sentences, BLiSS provides 136,867 controlled triplets (corrected, learner, artificial) for this purpose. Experiments on a diverse suite of models demonstrate that selective tolerance is a distinct capability from standard grammaticality, with performance clustering strongly by training paradigm. This validates BLiSS as a robust tool for measuring how different training objectives impact a model's alignment with the systematic patterns of human language acquisition.
arXiv.org Artificial Intelligence
Oct-23-2025
- Country:
- Asia
- India > Maharashtra
- Mumbai (0.04)
- South Korea (0.04)
- India > Maharashtra
- Europe
- Austria > Vienna (0.14)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Spain (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- North America
- Canada > Ontario
- Toronto (0.04)
- United States
- Florida > Miami-Dade County
- Miami (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Florida > Miami-Dade County
- Canada > Ontario
- Asia
- Genre:
- Research Report (1.00)
- Industry:
- Education (0.68)
- Technology: