Comprehensive Evaluation on Lexical Normalization: Boundary-Aware Approaches for Unsegmented Languages
Higashiyama, Shohei, Utiyama, Masao
–arXiv.org Artificial Intelligence
Lexical normalization research has sought to tackle the challenge of processing informal expressions in user-generated text, yet the absence of comprehensive evaluations leaves it unclear which methods excel across multiple perspectives. Focusing on unsegmented languages, we make three key contributions: (1) creating a large-scale, multi-domain Japanese normalization dataset, (2) developing normalization methods based on state-of-the-art pretrained models, and (3) conducting experiments across multiple evaluation perspectives. Our experiments show that both encoder-only and decoder-only approaches achieve promising results in both accuracy and efficiency.
arXiv.org Artificial Intelligence
Dec-2-2025
- Country:
- Asia
- China
- Japan > Honshū
- Chūbu > Aichi Prefecture
- Nagoya (0.04)
- Kansai > Kyoto Prefecture
- Kyoto (0.04)
- Kantō > Tokyo Metropolis Prefecture
- Tokyo (0.04)
- Tōhoku (0.04)
- Chūbu > Aichi Prefecture
- Middle East
- Qatar > Ad-Dawhah
- Doha (0.04)
- Saudi Arabia > Asir Province
- Abha (0.04)
- Qatar > Ad-Dawhah
- Singapore (0.04)
- Taiwan > Taiwan Province
- Taipei (0.04)
- Europe
- Belgium (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.14)
- Portugal > Lisbon
- Lisbon (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- Maryland > Baltimore (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New Mexico
- Bernalillo County > Albuquerque (0.04)
- Santa Fe County > Santa Fe (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Pennsylvania (0.04)
- Canada > Ontario
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Asia
- Genre:
- Research Report > New Finding (0.67)
- Technology: