CLASSLA-Stanza: The Next Step for Linguistic Processing of South Slavic Languages

Aug-11-2023–arXiv.org Artificial Intelligence

We present CLASSLA-Stanza, a pipeline for automatic linguistic annotation of the South Slavic languages, which is based on the Stanza natural language processing pipeline. We describe the main improvements in CLASSLA-Stanza with respect to Stanza, and give a detailed description of the model training process for the latest 2.1 release of the pipeline. We also report performance scores produced by the pipeline for different languages and varieties. CLASSLA-Stanza exhibits consistently high performance across all the supported languages and outperforms or expands its parent pipeline Stanza at all the supported tasks. We also present the pipeline's new functionality enabling efficient processing of web data and the reasons that led to its implementation.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

Aug-11-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > Santa Clara County > Palo Alto (0.04)
- Europe
  - Bulgaria (0.04)
  - Slovenia > Central Slovenia
    - Municipality of Ljubljana > Ljubljana (0.04)
  - Middle East > Malta
    - Port Region > Southern Harbour District > Valletta (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Text Processing (0.72)
    - Grammars & Parsing (0.50)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found