Comprehensive Evaluation on Lexical Normalization: Boundary-Aware Approaches for Unsegmented Languages

Dec-2-2025–arXiv.org Artificial Intelligence

Lexical normalization research has sought to tackle the challenge of processing informal expressions in user-generated text, yet the absence of comprehensive evaluations leaves it unclear which methods excel across multiple perspectives. Focusing on unsegmented languages, we make three key contributions: (1) creating a large-scale, multi-domain Japanese normalization dataset, (2) developing normalization methods based on state-of-the-art pretrained models, and (3) conducting experiments across multiple evaluation perspectives. Our experiments show that both encoder-only and decoder-only approaches achieve promising results in both accuracy and efficiency.

computational linguistic, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

Dec-2-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Minnesota (0.28)
  - New Mexico (0.28)
- Europe > Middle East
  - Malta (0.28)
- Asia > Japan
  - Honshū (0.46)

Genre:
- Research Report > New Finding (0.67)

Technology:
- Information Technology
  - Communications > Social Media (0.68)
  - Artificial Intelligence
    - Natural Language
      - Large Language Model (0.95)
      - Text Processing (0.92)
      - Grammars & Parsing (0.68)
    - Machine Learning > Neural Networks
      - Deep Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found