AITopics

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.33)
Information Technology > Artificial Intelligence > Machine Learning (0.31)

Neural Information Processing SystemsFeb-11-2026, 04:00:29 GMT

StronglyIncrementalConstituencyParsingwith GraphNeuralNetworks

Parsing sentences into syntax trees can benefit downstream applications inNLP.

artificial intelligence, machine learning, natural language, (21 more...)

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China > Ningxia Hui Autonomous Region > Yinchuan (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Neural Information Processing SystemsFeb-8-2026, 21:46:28 GMT

AT Proofs

We then follow the proof of Theorem 3 in Farnia and Tse [2016]. Our formulation differs from Nowak-Vila et al. [2020] in the fact that we allow probabilistic prediction to be ground truth. Proposition 4. Let G be a multi-graph. We follow the proof of Friesen [2019] for simple graphs. Proposition 5. Let G be a multi-graph.

artificial intelligence, emp, machine learning, (18 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Neural Information Processing SystemsDec-24-2025, 21:52:18 GMT

Strongly Incremental Constituency Parsing with Graph Neural Networks

Parsing sentences into syntax trees can benefit downstream applications in NLP. Transition-based parsers build trees by executing actions in a state transition system. They are computationally efficient, and can leverage machine learning to predict actions based on partial trees. However, existing transition-based parsers are predominantly based on the shift-reduce transition system, which does not align with how humans are known to parse sentences. Psycholinguistic research suggests that human parsing is strongly incremental--humans grow a single parse tree by adding exactly one token at each step.

incremental constituency parsing, name change, transition system, (6 more...)

Genre: Research Report (0.60)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Klemen, Matej, Arčon, Tjaša, Terčon, Luka, Robnik-Šikonja, Marko, Dobrovoljc, Kaja

Towards Corpus-Grounded Agentic LLMs for Multilingual Grammatical Analysis

arXiv.org Artificial IntelligenceDec-2-2025

Empirical grammar research has become increasingly data-driven, but the systematic analysis of annotated corpora still requires substantial methodological and technical effort. We explore how agentic large language models (LLMs) can streamline this process by reasoning over annotated corpora and producing interpretable, data-grounded answers to linguistic questions. We introduce an agentic framework for corpus-grounded grammatical analysis that integrates concepts such as natural-language task interpretation, code generation, and data-driven reasoning. As a proof of concept, we apply it to Universal Dependencies (UD) corpora, testing it on multilingual grammatical tasks inspired by the World Atlas of Language Structures (WALS). The evaluation spans 13 word-order features and over 170 languages, assessing system performance across three complementary dimensions - dominant-order accuracy, order-coverage completeness, and distributional fidelity - which reflect how well the system generalizes, identifies, and quantifies word-order variations. The results demonstrate the feasibility of combining LLM reasoning with structured linguistic data, offering a first step toward interpretable, scalable automation of corpus-based grammatical inquiry.

accuracy, large language model, machine learning, (19 more...)

2512.00214

Country: Europe > Slovenia (0.15)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-9-2025

Bridging Discourse Treebanks with a Unified Rhetorical Structure Parser

Chistova, Elena

We introduce UniRST, the first unified RST-style discourse parser capable of handling 18 treebanks in 11 languages without modifying their relation inventories. To overcome inventory incompatibilities, we propose and evaluate two training strategies: Multi-Head, which assigns separate relation classification layer per inventory, and Masked-Union, which enables shared parameter training through selective label masking. We first benchmark monotreebank parsing with a simple yet effective augmentation technique for low-resource settings. We then train a unified model and show that (1) the parameter efficient Masked-Union approach is also the strongest, and (2) UniRST outperforms 16 of 18 mono-treebank baselines, demonstrating the advantages of a single-model, multilingual end-to-end discourse parsing across diverse resources.

artificial intelligence, machine learning, natural language, (19 more...)

2510.06427

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Jumelet, Jaap, Weissweiler, Leonie, Nivre, Joakim, Bisazza, Arianna

MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs

arXiv.org Artificial IntelligenceAug-25-2025

We introduce MultiBLiMP 1.0, a massively multilingual benchmark of linguistic minimal pairs, covering 101 languages and 2 types of subject-verb agreement, containing more than 128,000 minimal pairs. Our minimal pairs are created using a fully automated pipeline, leveraging the large-scale linguistic resources of Universal Dependencies and UniMorph. MultiBLiMP 1.0 evaluates abilities of LLMs at an unprecedented multilingual scale, and highlights the shortcomings of the current state-of-the-art in modelling low-resource languages.

computational linguistic, large language model, machine learning, (18 more...)

2504.02768

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.85)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Neural Information Processing SystemsAug-14-2025, 19:03:39 GMT

4f92d2f498b88f1bd43732312272967a-Supplemental-Conference.pdf

constraint, emp, polytope, (16 more...)

Country: North America > United States (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Ezquerro, Ana, Vilares, David, Yli-Jyrä, Anssi, Gómez-Rodríguez, Carlos

Hierarchical Bracketing Encodings for Dependency Parsing as Tagging

arXiv.org Artificial IntelligenceJul-11-2025

We present a family of encodings for sequence labeling dependency parsing, based on the concept of hierarchical bracketing. We prove that the existing 4-bit projective encoding belongs to this family, but it is suboptimal in the number of labels used to encode a tree. We derive an optimal hierarchical bracketing, which minimizes the number of symbols used and encodes projective trees using only 12 distinct labels (vs. 16 for the 4-bit encoding). We also extend optimal hierarchical bracketing to support arbitrary non-projectivity in a more compact way than previous encodings. Our new encodings yield competitive accuracy on a diverse set of treebanks.

arc, artificial intelligence, natural language, (17 more...)

2505.11693

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.61)

Arnett, Catherine, Hudspeth, Marisa, O'Connor, Brendan

Evaluating Morphological Alignment of Tokenizers in 70 Languages

arXiv.org Artificial IntelligenceJul-10-2025

While tokenization is a key step in language modeling, with effects on model training and performance, it remains unclear how to effectively evaluate tokenizer quality. One proposed dimension of tokenizer quality is the extent to which tokenizers preserve linguistically meaningful subwords, aligning token boundaries with morphological boundaries within a word. We expand MorphScore (Arnett & Bergen, 2025), which previously covered 22 languages, to support a total of 70 languages. The updated MorphScore offers more flexibility in evaluation and addresses some of the limitations of the original version. We then correlate our alignment scores with downstream task performance for five pre-trained languages models on seven tasks, with at least one task in each of the languages in our sample. We find that morphological alignment does not explain very much variance in model performance, suggesting that morphological alignment alone does not measure dimensions of tokenization quality relevant to model performance.

computational linguistic, large language model, machine learning, (18 more...)

2507.06378

Country:

Europe (1.00)
North America > United States (0.92)
Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.46)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.35)