AITopics | Grammars & Parsing

Collaborating Authors

Grammars & Parsing

News Overviews Instructional Materials AI-Alerts Classics

Enhancing Korean Dependency Parsing with Morphosyntactic Features

Park, Jungyeul, Chen, Yige, Kim, Kyuwon, Lim, KyungTae, Park, Chulwoo

arXiv.org Artificial IntelligenceMar-26-2025

This paper introduces UniDive for Korean, an integrated framework that bridges Universal Dependencies (UD) and Universal Morphology (UniMorph) to enhance the representation and processing of Korean {morphosyntax}. Korean's rich inflectional morphology and flexible word order pose challenges for existing frameworks, which often treat morphology and syntax separately, leading to inconsistencies in linguistic analysis. UniDive unifies syntactic and morphological annotations by preserving syntactic dependencies while incorporating UniMorph-derived features, improving consistency in annotation. We construct an integrated dataset and apply it to dependency parsing, demonstrating that enriched morphosyntactic features enhance parsing accuracy, particularly in distinguishing grammatical relations influenced by morphology. Our experiments, conducted with both encoder-only and decoder-only models, confirm that explicit morphological information contributes to more accurate syntactic analysis.

annotation, artificial intelligence, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.21029

Country:

Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(10 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

A Comparative Analysis of Word Segmentation, Part-of-Speech Tagging, and Named Entity Recognition for Historical Chinese Sources, 1900-1950

Fang, Zhao, Wu, Liang-Chun, Kong, Xuening, Stewart, Spencer Dean

arXiv.org Artificial IntelligenceMar-25-2025

This paper compares large language models (LLMs) and traditional natural language processing (NLP) tools for performing word segmentation, part-of-speech (POS) tagging, and named entity recognition (NER) on Chinese texts from 1900 to 1950. Historical Chinese documents pose challenges for text analysis due to their logographic script, the absence of natural word boundaries, and significant linguistic changes. Using a sample dataset from the Shanghai Library Republican Journal corpus, traditional tools such as Jieba and spaCy are compared to LLMs, including GPT-4o, Claude 3.5, and the GLM series. The results show that LLMs outperform traditional methods in all metrics, albeit at considerably higher computational costs, highlighting a trade-off between accuracy and efficiency. Additionally, LLMs better handle genre-specific challenges such as poetry and temporal variations (i.e., pre-1920 versus post-1920 texts), demonstrating that their contextual learning capabilities can advance NLP approaches to historical texts by reducing the need for domain-specific training data.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2503.19844

Country:

Asia > China > Shanghai > Shanghai (0.26)
North America > United States > Illinois > Cook County > Chicago (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Machine-assisted writing evaluation: Exploring pre-trained language models in analyzing argumentative moves

Qin, Wenjuan, Wang, Weiran, Yang, Yuming, Gui, Tao

arXiv.org Artificial IntelligenceMar-24-2025

The study investigates the efficacy of pre-trained language models (PLMs) in analyzing argumentative moves in a longitudinal learner corpus. Prior studies on argumentative moves often rely on qualitative analysis and manual coding, limiting their efficiency and generalizability. The study aims to: 1) to assess the reliability of PLMs in analyzing argumentative moves; 2) to utilize PLM-generated annotations to illustrate developmental patterns and predict writing quality. A longitudinal corpus of 1643 argumentative texts from 235 English learners in China is collected and annotated into six move types: claim, data, counter-claim, counter-data, rebuttal, and non-argument. The corpus is divided into training, validation, and application sets annotated by human experts and PLMs. We use BERT as one of the implementations of PLMs. The results indicate a robust reliability of PLMs in analyzing argumentative moves, with an overall F1 score of 0.743, surpassing existing models in the field. Additionally, PLM-labeled argumentative moves effectively capture developmental patterns and predict writing quality. Over time, students exhibit an increase in the use of data and counter-claims and a decrease in non-argument moves. While low-quality texts are characterized by a predominant use of claims and data supporting only oneside position, mid- and high-quality texts demonstrate an integrative perspective with a higher ratio of counter-claims, counter-data, and rebuttals. This study underscores the transformative potential of integrating artificial intelligence into language education, enhancing the efficiency and accuracy of evaluating students' writing. The successful application of PLMs can catalyze the development of educational technology, promoting a more data-driven and personalized learning environment that supports diverse educational needs.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.19279

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Oceania > Palau (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Education > Curriculum > Subject-Specific Education (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.66)
Education > Educational Setting > K-12 Education > Secondary School (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

ParsiPy: NLP Toolkit for Historical Persian Texts in Python

Farsi, Farhan, Fazel, Parnian, Haghighi, Sepand, Sabouri, Sadra, Goshtasb, Farzaneh, Hajipour, Nadia, Asgari, Ehsaneddin, Sameti, Hossein

arXiv.org Artificial IntelligenceMar-22-2025

The study of historical languages presents unique challenges due to their complex orthographic systems, fragmentary textual evidence, and the absence of standardized digital representations of text in those languages. Tackling these challenges needs special NLP digital tools to handle phonetic transcriptions and analyze ancient texts. This work introduces ParsiPy, an NLP toolkit designed to facilitate the analysis of historical Persian languages by offering modules for tokenization, lemmatization, part-of-speech tagging, phoneme-to-transliteration conversion, and word embedding. We demonstrate the utility of our toolkit through the processing of Parsig (Middle Persian) texts, highlighting its potential for expanding computational methods in the study of historical languages. Through this work, we contribute to computational philology, offering tools that can be adapted for the broader study of ancient texts and their digital preservation.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.1781

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > California (0.14)
Europe > Bulgaria > Varna Province > Varna (0.05)
(8 more...)

Genre: Research Report (0.52)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.91)

Add feedback

Interview with AAAI Fellow Roberto Navigli: multilingual natural language processing

AIHubMar-21-2025, 08:45:45 GMT

Each year the AAAI recognizes a group of individuals who have made significant, sustained contributions to the field of artificial intelligence by appointing them as Fellows. Over the course of the next few months, we'll be talking to some of the 2025 AAAI Fellows. In this interview we hear from Roberto Navigli, Sapienza University of Rome, who was elected as a Fellow for "significant contributions to multilingual Natural Language Understanding, and development of widely recognized methods for knowledge resource construction, text disambiguation, and semantic parsing". We find out about his career path, some big research projects he's led, and why it's important to follow your passion. My area of research is natural language processing (NLP).

large language model, machine learning, natural language, (15 more...)

AIHub

Country: Europe > Italy > Lazio > Rome (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Towards Generating Automatic Anaphora Annotations

Taji, Dima, Zeman, Daniel

arXiv.org Artificial IntelligenceMar-21-2025

Training models that can perform well on various NLP tasks require large amounts of data, and this becomes more apparent with nuanced tasks such as anaphora and conference resolution. To combat the prohibitive costs of creating manual gold annotated data, this paper explores two methods to automatically create datasets with coreferential annotations; direct conversion from existing datasets, and parsing using multilingual models capable of handling new and unseen languages. The paper details the current progress on those two fronts, as well as the challenges the efforts currently face, and our approach to overcoming these challenges.

annotation, artificial intelligence, natural language, (14 more...)

arXiv.org Artificial Intelligence

2503.09417

Country:

Europe > Czechia > Prague (0.05)
Europe > Portugal > Lisbon > Lisbon (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(5 more...)

Genre:

Research Report (0.50)
Overview (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Incomplete Utterance Rewriting with Editing Operation Guidance and Utterance Augmentation

Cao, Zhiyu, Li, Peifeng, Fan, Yaxin, Zhu, Qiaoming

arXiv.org Artificial IntelligenceMar-20-2025

Although existing fashionable generation methods on Incomplete Utterance Rewriting (IUR) can generate coherent utterances, they often result in the inclusion of irrelevant and redundant tokens in rewritten utterances due to their inability to focus on critical tokens in dialogue context. Furthermore, the limited size of the training datasets also contributes to the insufficient training of the IUR model. To address the first issue, we propose a multi-task learning framework EO-IUR (Editing Operation-guided Incomplete Utterance Rewriting) that introduces the editing operation labels generated by sequence labeling module to guide generation model to focus on critical tokens. Furthermore, we introduce a token-level heterogeneous graph to represent dialogues. To address the second issue, we propose a two-dimensional utterance augmentation strategy, namely editing operation-based incomplete utterance augmentation and LLM-based historical utterance augmentation. The experimental results on three datasets demonstrate that our EO-IUR outperforms previous state-of-the-art (SOTA) baselines in both open-domain and task-oriented dialogue. The code will be available at https://github.com/Dewset/EO-IUR.

large language model, machine learning, utterance, (19 more...)

arXiv.org Artificial Intelligence

2503.16043

Country:

Asia > China > Shandong Province > Qingdao (0.05)
North America > Canada > Ontario > Toronto (0.04)
Asia > China > Hong Kong (0.04)
(7 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Strategic resource allocation in memory encoding: An efficiency principle shaping language processing

Xu, Weijie, Futrell, Richard

arXiv.org Artificial IntelligenceMar-18-2025

How is the limited capacity of working memory efficiently used to support human linguistic behaviors? In this paper, we investigate strategic resource allocation as an efficiency principle for memory encoding in sentence processing. The idea is that working memory resources are dynamically and strategically allocated to prioritize novel and unexpected information, enhancing their representations to make them less susceptible to memory decay and interference. Theoretically, from a resource-rational perspective, we argue that this efficiency principle naturally arises from two functional assumptions about working memory, namely, its limited capacity and its noisy representation. Empirically, through naturalistic corpus data, we find converging evidence for strategic resource allocation in the context of dependency locality from both the production and the comprehension side, where non-local dependencies with less predictable antecedents are associated with reduced locality effect. However, our results also reveal considerable cross-linguistic variability, highlighting the need for a closer examination of how strategic resource allocation, as a universal efficiency principle, interacts with language-specific phrase structures.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.14728

Country:

Asia > Indonesia > Bali (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.66)

Add feedback

Second language Korean Universal Dependency treebank v1.2: Focus on data augmentation and annotation scheme refinement

Sung, Hakyung, Shin, Gyu-Ho

arXiv.org Artificial IntelligenceMar-18-2025

We expand the second language (L2) Korean Universal Dependencies (UD) treebank with 5,454 manually annotated sentences. The annotation guidelines are also revised to better align with the UD framework. Using this enhanced treebank, we fine-tune three Korean language models and evaluate their performance on in-domain and out-of-domain L2-Korean datasets. The results show that fine-tuning significantly improves their performance across various metrics, thus highlighting the importance of using well-tailored L2 datasets for fine-tuning first-language-based, general-purpose language models for the morphosyntactic analysis of L2 data.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.14718

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Oregon (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Valid Text-to-SQL Generation with Unification-based DeepStochLog

Jiao, Ying, De Raedt, Luc, Marra, Giuseppe

arXiv.org Artificial IntelligenceMar-17-2025

Large language models have been used to translate natural language questions to SQL queries. Without hard constraints on syntax and database schema, they occasionally produce invalid queries that are not executable. These failures limit the usage of these systems in real-life scenarios. We propose a neurosymbolic framework that imposes SQL syntax and schema constraints with unification-based definite clause grammars and thus guarantees the generation of valid queries. Our framework also builds a bi-directional interface to language models to leverage their natural language understanding abilities. The evaluation results on a subset of SQL grammars show that all our output queries are valid. This work is the first step towards extending language models with unification-based grammars. We demonstrate this extension enhances the validity, execution accuracy, and ground truth alignment of the underlying language model by a large margin.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-71167-1_17

2503.13342

Country:

Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
Europe > Sweden > Örebro County > Örebro (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback