AITopics | Grammars & Parsing

Collaborating Authors

Grammars & Parsing

News Overviews Instructional Materials AI-Alerts Classics

Developing a Modular Compiler for a Subset of a C-like Language

Dutta, Debasish, Sonowal, Neeharika, Hazarika, Irani

arXiv.org Artificial IntelligenceJan-8-2025

The paper introduces the development of a modular compiler for a subset of a C-like language, which addresses the challenges in constructing a compiler for high-level languages. This modular approach will allow developers to modify a language by adding or removing subsets as required, resulting in a minimal and memory-efficient compiler. The development process is divided into small, incremental steps, where each step yields a fully functioning compiler for an expanding subset of the language. The paper outlines the iterative developmental phase of the compiler, emphasizing progressive enhancements in capabilities and functionality. Adherence to industry best practices of modular design, code reusability, and documentation has enabled the resulting compiler's functional efficiency, maintainability, and extensibility. The compiler proved to be effective not only in managing the language structure but also in developing optimized code, which demonstrates its practical usability. This was also further assessed using the compiler on a tiny memory-deficient single-board computer, again showing the compiler's efficiency and suitability for resource-constrained devices.

artificial intelligence, natural language, programming language, (19 more...)

arXiv.org Artificial Intelligence

2501.04503

Genre: Research Report (0.40)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.96)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.47)

Add feedback

Semantically Cohesive Word Grouping in Indian Languages

Karthika, N J, Patra, Adyasha, Naidu, Nagasai Saketh, Bhattacharya, Arnab, Ramakrishnan, Ganesh, Dangarikar, Chaitali

arXiv.org Artificial IntelligenceJan-7-2025

Indian languages are inflectional and agglutinative and typically follow clause-free word order. The structure of sentences across most major Indian languages are similar when their dependency parse trees are considered. While some differences in the parsing structure occur due to peculiarities of a language or its preferred natural way of conveying meaning, several apparent differences are simply due to the granularity of representation of the smallest semantic unit of processing in a sentence. The semantic unit is typically a word, typographically separated by whitespaces. A single whitespace-separated word in one language may correspond to a group of words in another. Hence, grouping of words based on semantics helps unify the parsing structure of parallel sentences across languages and, in the process, morphology. In this work, we propose word grouping as a major preprocessing step for any computational or linguistic processing of sentences for Indian languages. Among Indian languages, since Hindi is one of the least agglutinative, we expect it to benefit the most from word-grouping. Hence, in this paper, we focus on Hindi to study the effects of grouping. We perform quantitative assessment of our proposal with an intrinsic method that perturbs sentences by shuffling words as well as an extrinsic evaluation that verifies the importance of word grouping for the task of Machine Translation (MT) using decomposed prompting. We also qualitatively analyze certain aspects of the syntactic structure of sentences. Our experiments and analyses show that the proposed grouping technique brings uniformity in the syntactic structures, as well as aids underlying NLP tasks.

artificial intelligence, indian language, natural language, (16 more...)

arXiv.org Artificial Intelligence

2501.03988

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Improving Dialectal Slot and Intent Detection with Auxiliary Tasks: A Multi-Dialectal Bavarian Case Study

Krückl, Xaver Maria, Blaschke, Verena, Plank, Barbara

arXiv.org Artificial IntelligenceJan-7-2025

Reliable slot and intent detection (SID) is crucial in natural language understanding for applications like digital assistants. Encoder-only transformer models fine-tuned on high-resource languages generally perform well on SID. However, they struggle with dialectal data, where no standardized form exists and training data is scarce and costly to produce. We explore zero-shot transfer learning for SID, focusing on multiple Bavarian dialects, for which we release a new dataset for the Munich dialect. We evaluate models trained on auxiliary tasks in Bavarian, and compare joint multi-task learning with intermediate-task training. We also compare three types of auxiliary tasks: token-level syntactic tasks, named entity recognition (NER), and language modelling. We find that the included auxiliary tasks have a more positive effect on slot filling than intent classification (with NER having the most positive effect), and that intermediate-task training yields more consistent performance gains. Our best-performing approach improves intent classification performance on Bavarian dialects by 5.1 and slot filling F1 by 8.4 percentage points.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.03863

Country:

North America > United States > Minnesota (0.28)
Asia > Middle East > UAE (0.28)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.25)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.46)

Add feedback

Syntactic Evolution in Language Usage

Kumar, Surbhit

arXiv.org Artificial IntelligenceJan-4-2025

This research aims to investigate the dynamic nature of linguistic style throughout various stages of life, from post teenage to old age. By employing linguistic analysis tools and methodologies, the study will delve into the intricacies of how individuals adapt and modify their language use over time. The research uses a data set of blogs from blogger.com from 2004 and focuses on English for syntactic analysis. The findings of this research can have implications for linguistics, psychology, and communication studies, shedding light on the intricate relationship between age and language.

age group, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.02392

Country:

Oceania > Australia (0.14)
North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

The Proof is in the Almond Cookies

van Trijp, Remi, Beuls, Katrien, Van Eecke, Paul

arXiv.org Artificial IntelligenceJan-3-2025

This paper presents a case study on how to process cooking recipes (and more generally, how-to instructions) in a way that makes it possible for a robot or artificial cooking assistant to support human chefs in the kitchen. Such AI assistants would be of great benefit to society, as they can help to sustain the autonomy of aging adults or people with a physical impairment, or they may reduce the stress in a professional kitchen. We propose a novel approach to computational recipe understanding that mimics the human sense-making process, which is narrative-based. Using an English recipe for almond crescent cookies as illustration, we show how recipes can be modelled as rich narrative structures by integrating various knowledge sources such as language processing, ontologies, and mental simulation. We show how such narrative structures can be used for (a) dealing with the challenges of recipe language, such as zero anaphora, (b) optimizing a robot's planning process, (c) measuring how well an AI system understands its current tasks, and (d) allowing recipe annotations to become language-independent.

artificial intelligence, natural language, recipe, (17 more...)

arXiv.org Artificial Intelligence

2501.01827

Country:

North America > United States (0.68)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.90)

Add feedback

Leveraging Full Dependency Parsing Graph Information For Biomedical Event Extraction

Noravesh, Farshad, Haffari, Reza, Fang, Ong Huey, Soon, Layki, Rajalana, Sailaja, Pal, Arghya

arXiv.org Artificial IntelligenceJan-2-2025

Many models are proposed in the literature on biomedical event extraction(BEE). Some of them use the shortest dependency path(SDP) information to represent the argument classification task. There is an issue with this representation since even missing one word from the dependency parsing graph may totally change the final prediction. To this end, the full adjacency matrix of the dependency graph is used to embed individual tokens using a graph convolutional network(GCN). An ablation study is also done to show the effect of the dependency graph on the overall performance. The results show a significant improvement when dependency graph information is used. The proposed model slightly outperforms state-of-the-art models on BEE over different datasets.

computational linguistic, dependency, extraction, (16 more...)

arXiv.org Artificial Intelligence

2501.01158

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Malaysia (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
(5 more...)

Genre:

Research Report > Promising Solution (0.35)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.75)

Add feedback

Loss-Aware Curriculum Learning for Chinese Grammatical Error Correction

Zhang, Ding, Li, Yangning, Bai, Lichen, Zhang, Hao, Li, Yinghui, Lin, Haiye, Zheng, Hai-Tao, Su, Xin, Shan, Zifei

arXiv.org Artificial IntelligenceDec-31-2024

Chinese grammatical error correction (CGEC) aims to detect and correct errors in the input Chinese sentences. Recently, Pre-trained Language Models (PLMS) have been employed to improve the performance. However, current approaches ignore that correction difficulty varies across different instances and treat these samples equally, enhancing the challenge of model learning. To address this problem, we propose a multi-granularity Curriculum Learning (CL) framework. Specifically, we first calculate the correction difficulty of these samples and feed them into the model from easy to hard batch by batch. Then Instance-Level CL is employed to help the model optimize in the appropriate direction automatically by regulating the loss function. Extensive experimental results and comprehensive analyses of various datasets prove the effectiveness of our method.

cgec model, computational linguistic, grammatical error correction, (12 more...)

arXiv.org Artificial Intelligence

2501.00334

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Guangdong Province > Shenzhen (0.08)
North America > Dominican Republic (0.04)
(6 more...)

Genre: Research Report (0.64)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Quality > Data Cleaning (0.64)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.64)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

The Text Classification Pipeline: Starting Shallow going Deeper

Siino, Marco, Tinnirello, Ilenia, La Cascia, Marco

arXiv.org Artificial IntelligenceDec-30-2024

Text Classification (TC) stands as a cornerstone within the realm of Natural Language Processing (NLP), particularly when viewed through the lens of computer science and engineering. The past decade has seen deep learning revolutionize TC, propelling advancements in text retrieval, categorization, information extraction, and summarization. The scholarly literature is rich with datasets, models, and evaluation criteria, with English being the predominant language of focus, despite studies involving Arabic, Chinese, Hindi, and others. The efficacy of TC models relies heavily on their ability to capture intricate textual relationships and nonlinear correlations, necessitating a comprehensive examination of the entire TC pipeline. This monograph provides an in-depth exploration of the TC pipeline, with a particular emphasis on evaluating the impact of each component on the overall performance of TC models. The pipeline includes state-of-the-art datasets, text preprocessing techniques, text representation methods, classification models, evaluation metrics, current results and future trends. Each chapter meticulously examines these stages, presenting technical innovations and significant recent findings. The work critically assesses various classification strategies, offering comparative analyses, examples, case studies, and experimental evaluations. These contributions extend beyond a typical survey, providing a detailed and insightful exploration of TC.

machine learning, natural language, text classification, (25 more...)

arXiv.org Artificial Intelligence

2501.00174

Country:

Europe (1.00)
Asia > Japan > Honshū (0.27)
North America > United States > California (0.27)

Genre:

Summary/Review (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
(4 more...)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Law (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
(11 more...)

Add feedback

Generating event descriptions under syntactic and semantic constraints

Cao, Angela, Holt, Faye, Chan, Jonas, Richter, Stephanie, Glass, Lelia, White, Aaron Steven

arXiv.org Artificial IntelligenceDec-24-2024

With the goal of supporting scalable lexical semantic annotation, analysis, and theorizing, we conduct a comprehensive evaluation of different methods for generating event descriptions under both syntactic constraints -- e.g. desired clause structure -- and semantic constraints -- e.g. desired verb sense. We compare three different methods -- (i) manual generation by experts; (ii) sampling from a corpus annotated for syntactic and semantic information; and (iii) sampling from a language model (LM) conditioned on syntactic and semantic information -- along three dimensions of the generated event descriptions: (a) naturalness, (b) typicality, and (c) distinctiveness. We find that all methods reliably produce natural, typical, and distinctive event descriptions, but that manual generation continues to produce event descriptions that are more natural, typical, and distinctive than the automated generation methods. We conclude that the automated methods we consider produce event descriptions of sufficient quality for use in downstream annotation and analysis insofar as the methods used for this annotation and analysis are robust to a small amount of degradation in the resulting event descriptions.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.18496

Country:

Europe (0.67)
North America > United States > Minnesota (0.28)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Extracting triples from dialogues for conversational social agents

Vossen, Piek, Santamaría, Selene Báez, Bajčetić, Lenka, Belluci, Thomas

arXiv.org Artificial IntelligenceDec-24-2024

Obtaining an explicit understanding of communication within a Hybrid Intelligence collaboration is essential to create controllable and transparent agents. In this paper, we describe a number of Natural Language Understanding models that extract explicit symbolic triples from social conversation. Triple extraction has mostly been developed and tested for Knowledge Base Completion using Wikipedia text and data for training and testing. However, social conversation is very different as a genre in which interlocutors exchange information in sequences of utterances that involve statements, questions, and answers. Phenomena such as co-reference, ellipsis, coordination, and implicit and explicit negation or confirmation are more prominent in conversation than in Wikipedia text. We therefore describe an attempt to fill this gap by releasing data sets for training and testing triple extraction from social conversation. We also created five triple extraction models and tested them in our evaluation data. The highest precision is 51.14 for complete triples and 69.32 for triple elements when tested on single utterances. However, scores for conversational triples that span multiple turns are much lower, showing that extracting knowledge from true conversational data is much more challenging.

extraction, large language model, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.18364

Country:

Europe (0.94)
North America > Mexico (0.28)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.69)
(2 more...)

Add feedback