AITopics

2508.00121

Country:

Europe (0.95)
North America > United States (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Inostroza, Demian, Mistica, Mel, Vylomova, Ekaterina, Guest, Chris, Kurniawan, Kemal

A Joint Multitask Model for Morpho-Syntactic Parsing

arXiv.org Artificial IntelligenceAug-21-2025

We present a joint multitask model for the UniDive 2025 Morpho-Syntactic Parsing shared task, where systems predict both morphological and syntactic analyses following novel UD annotation scheme. Our system uses a shared XLM-RoBERTa encoder with three specialized decoders for content word identification, dependency parsing, and morphosyntactic feature prediction. Our model achieves the best overall performance on the shared task's leaderboard covering nine typologically diverse languages, with an average MSLAS score of 78.7 percent, LAS of 80.1 percent, and Feats F1 of 90.3 percent. Our ablation studies show that matching the task's gold tokenization and content word identification are crucial to model performance. Error analysis reveals that our model struggles with core grammatical cases (particularly Nom-Acc) and nominal features across languages.

artificial intelligence, content word identification, natural language, (14 more...)

2508.14307

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Eui Chul Shin, Miltiadis Allamanis, Marc Brockschmidt, Alex Polozov

Program Synthesis and Semantic Parsing with Learned Code Idioms

Neural Information Processing SystemsAug-20-2025, 03:03:21 GMT

Neural Information Processing Systems http://nips.cc/

idiom, proceedings, synthesis, (14 more...)

Country:

Asia > Middle East > Jordan (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Workflow (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Jakubův, Jan, Janota, Mikoláš

Quantifier Instantiations: To Mimic or To Revolt?

arXiv.org Artificial IntelligenceAug-20-2025

Quantified formulas pose a significant challenge for Satisfiability Modulo Theories (SMT) solvers due to their inherent undecidability. Existing instantiation techniques, such as e-matching, syntax-guided, model-based, conflict-based, and enumerative methods, often complement each other. This paper introduces a novel instantiation approach that dynamically learns from these techniques during solving. By treating observed instantiations as samples from a latent language, we use probabilistic context-free grammars to generate new, similar terms. Our method not only mimics successful past instantiations but also explores diversity by optionally inverting learned term probabilities, aiming to balance exploitation and exploration in quantifier reasoning.

artificial intelligence, instantiation, natural language, (16 more...)

2508.13811

Country: North America > United States (0.94)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Neural Information Processing SystemsAug-19-2025, 17:32:21 GMT

Learning Structure from the Ground up--Hierarchical Representation Learning by Chunking

Sequential data in our everyday life is often hierarchically structured.

artificial intelligence, machine learning, natural language, (19 more...)

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Virginia (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (0.68)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.94)
(2 more...)

Neural Information Processing SystemsAug-19-2025, 12:49:07 GMT

Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing Shentong Mo Carnegie Mellon University Y apeng Tian University of Texas at Dallas

The audio-visual video parsing task aims to parse a video into modality-and category-aware temporal segments. Previous work mainly focuses on weakly-supervised approaches, which learn from video-level event labels.

artificial intelligence, machine learning, natural language, (15 more...)

Country: North America > United States > Texas (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.64)

arXiv.org Artificial IntelligenceAug-19-2025

It takes a village to write a book: Mapping anonymous contributions in Stephen Langton's Quaestiones Theologiae

Maliszewski, Jan

While the indirect evidence suggests that already in the early scholastic period the literary production based on records of oral teaching (so-called reportationes) was not uncommon, there are very few sources commenting on the practice. This paper details the design of a study applying stylometric techniques of authorship attribution to a collection developed from reportationes -- Stephen Langton's Quaestiones Theologiae -- aiming to uncover layers of editorial work and thus validate some hypotheses regarding the collection's formation. Following Camps, Clérice, and Pinche (2021), I discuss the implementation of an HTR pipeline and stylometric analysis based on the most frequent words, POS tags, and pseudo-affixes. The proposed study will offer two methodological gains relevant to computational research on the scholastic tradition: it will directly compare performance on manually composed and automatically extracted data, and it will test the validity of transformer-based OCR and automated transcription alignment for workflows applied to scholastic Latin corpora. If successful, this study will provide an easily reusable template for the exploratory analysis of collaborative literary production stemming from medieval universities.

artificial intelligence, machine learning, natural language, (18 more...)

doi: 10.1017/chr.2025.10004

2508.1283

Country: Europe (0.68)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

arXiv.org Artificial IntelligenceAug-19-2025

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling

Lipkin, Benjamin, LeBrun, Benjamin, Vigly, Jacob Hoover, Loula, João, MacIver, David R., Du, Li, Eisner, Jason, Cotterell, Ryan, Mansinghka, Vikash, O'Donnell, Timothy J., Lew, Alexander K., Vieira, Tim

The dominant approach to generating from language models subject to some constraint is locally constrained decoding (LCD), incrementally sampling tokens at each time step such that the constraint is never violated. Typically, this is achieved through token masking: looping over the vocabulary and excluding non-conforming tokens. There are two important problems with this approach. (i) Evaluating the constraint on every token can be prohibitively expensive -- LM vocabularies often exceed $100,000$ tokens. (ii) LCD can distort the global distribution over strings, sampling tokens based only on local information, even if they lead down dead-end paths. This work introduces a new algorithm that addresses both these problems. First, to avoid evaluating a constraint on the full vocabulary at each step of generation, we propose an adaptive rejection sampling algorithm that typically requires orders of magnitude fewer constraint evaluations. Second, we show how this algorithm can be extended to produce low-variance, unbiased estimates of importance weights at a very small additional cost -- estimates that can be soundly used within previously proposed sequential Monte Carlo algorithms to correct for the myopic behavior of local constraint enforcement. Through extensive empirical evaluation in text-to-SQL, molecular synthesis, goal inference, pattern matching, and JSON domains, we show that our approach is superior to state-of-the-art baselines, supporting a broader class of constraints and improving both runtime and performance. Additional theoretical and empirical analyses show that our method's runtime efficiency is driven by its dynamic use of computation, scaling with the divergence between the unconstrained and constrained LM, and as a consequence, runtime improvements are greater for better models.

constraint, machine learning, natural language, (17 more...)

2504.0541

Country:

North America > United States (0.46)
North America > Canada (0.27)

Genre:

Research Report (1.00)
Workflow (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Neural Information Processing SystemsAug-17-2025, 22:36:22 GMT

dd17e652cd2a08fdb8bf7f68e2ad3814-Paper.pdf

artificial intelligence, machine learning, natural language, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Singapore (0.04)
Asia > China > Guangxi Province > Nanning (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsAug-17-2025, 13:56:37 GMT

Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning Maxwell Nye

In this work, we seek a lightweight, training-free means of improving existing System 1-like sequence models by adding System 2-inspired logical reasoning.

artificial intelligence, machine learning, natural language, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada (0.04)
Europe > Germany > Saarland (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.97)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.68)