ERAS: Evaluating the Robustness of Chinese NLP Models to Morphological Garden Path Errors

Oct-16-2024–arXiv.org Artificial Intelligence

In languages without orthographic word boundaries, NLP models perform word segmentation, either as an explicit preprocessing step or as an implicit step in an end-to-end computation. This paper shows that Chinese NLP models are vulnerable to morphological garden path errors: errors caused by a failure to resolve local word segmentation ambiguities using sentence-level morphosyntactic context. We propose a benchmark, ERAS, that tests a model's vulnerability to morphological garden path errors by comparing its behavior on sentences with and without local segmentation ambiguities. Using ERAS, we show that word segmentation models make garden path errors on locally ambiguous sentences, but do not make equivalent errors on unambiguous sentences. We further show that sentiment analysis models with character-level tokenization make implicit garden path errors, even without an explicit word segmentation step in the pipeline. Our results indicate that models' segmentation of Chinese text often fails to account for morphosyntactic context.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Oct-16-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Ohio > Franklin County
    - Columbus (0.04)
  - New York > New York County
    - New York City (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
- Europe
  - Slovenia (0.04)
  - Switzerland > Zürich
    - Zürich (0.14)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Netherlands > South Holland
    - Rotterdam (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Czechia > South Moravian Region
    - Brno (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - South Korea (0.14)
  - Singapore (0.04)
  - Middle East > Jordan (0.04)
  - Japan > Hokkaidō
    - Hokkaidō Prefecture > Sapporo (0.04)
  - China
    - Hong Kong (0.04)
    - Shanghai > Shanghai (0.04)

Genre:
- Research Report > New Finding (0.66)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Grammars & Parsing (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found