Parsing the Switch: LLM-Based UD Annotation for Complex Code-Switched and Low-Resource Languages
Kellert, Olga, Tyagi, Nemika, Imran, Muhammad, Licona-Guevara, Nelvin, Gómez-Rodríguez, Carlos
–arXiv.org Artificial Intelligence
Code-switching presents a complex challenge for syntactic analysis, especially in low-resource language settings where annotated data is scarce. While recent work has explored the use of large language models (LLMs) for sequence-level tagging, few approaches systematically investigate how well these models capture syntactic structure in code-switched contexts. Moreover, existing parsers trained on monolingual treebanks often fail to generalize to multilingual and mixed-language input. To address this gap, we introduce the BiLingua Parser, an LLM-based annotation pipeline designed to produce Universal Dependencies (UD) annotations for code-switched text. First, we develop a prompt-based framework for Spanish-English and Spanish-Guaraní data, combining few-shot LLM prompting with expert review. Second, we release two annotated datasets, including the first Spanish-Guaraní UD-parsed corpus. Third, we conduct a detailed syntactic analysis of switch points across language pairs and communicative contexts. Experimental results show that BiLingua Parser achieves up to 95.29% LAS after expert revision, significantly outperforming prior baselines and multilingual parsers. These results show that LLMs, when carefully guided, can serve as practical tools for bootstrapping syntactic resources in under-resourced, code-switched environments. Data and source code are available at https://github.com/N3mika/ParsingProject
arXiv.org Artificial Intelligence
Jun-10-2025
- Country:
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Spain (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Belgium > Brussels-Capital Region
- North America > United States
- Arizona (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Texas > Travis County
- Austin (0.04)
- Washington > King County
- Seattle (0.04)
- South America
- Chile > Santiago Metropolitan Region
- Santiago Province > Santiago (0.04)
- Paraguay (0.04)
- Chile > Santiago Metropolitan Region
- Europe
- Genre:
- Research Report > New Finding (1.00)
- Technology: