AITopics | syntactic feature

Collaborating Authors

syntactic feature

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Comparing human and LLM proofreading in L2 writing: Impact on lexical and syntactic features

Sung, Hakyung, Csuros, Karla, Sung, Min-Chang

arXiv.org Artificial IntelligenceOct-14-2025

This study examines the lexical and syntactic interventions of human and LLM proofreading aimed at improving overall intelligibility in identical second language writings, and evaluates the consistency of outcomes across three LLMs (ChatGPT-4o, Llama3.1-8b, Deepseek-r1-8b). Findings show that both human and LLM proofreading enhance bigram lexical features, which may contribute to better coherence and contextual connectedness between adjacent words. However, LLM proofreading exhibits a more generative approach, extensively reworking vocabulary and sentence structures, such as employing more diverse and sophisticated vocabulary and incorporating a greater number of adjective modifiers in noun phrases. The proofreading outcomes are highly consistent in major lexical and syntactic features across the three models.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2025.bea-1.2

2506.09021

Country:

Asia (1.00)
Europe (0.68)
North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multilingual Irony Detection with Dependency Syntax and Neural Models

Cignarella, Alessandra Teresa, Basile, Valerio, Sanguinetti, Manuela, Bosco, Cristina, Rosso, Paolo, Benamara, Farah

arXiv.org Artificial IntelligenceMay-26-2025

This paper presents an in-depth investigation of the effectiveness of dependency-based syntactic features on the irony detection task in a multilingual perspective (English, Spanish, French and Italian). It focuses on the contribution from syntactic knowledge, exploiting linguistic resources where syntax is annotated according to the Universal Dependencies scheme. Three distinct experimental settings are provided. In the first, a variety of syntactic dependency-based features combined with classical machine learning classifiers are explored. In the second scenario, two well-known types of word embeddings are trained on parsed data and tested against gold standard datasets. In the third setting, dependency-based syntactic features are combined into the Multilingual BERT architecture. The results suggest that fine-grained dependency-based syntactic information is informative for the detection of irony.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2020.coling-main.116

2011.05706

Country: Europe > France (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.68)

Add feedback

Do Large Language Models know who did what to whom?

Denning, Joseph M., Guo, Xiaohan Hannah, Snefjella, Bryor, Blank, Idan A.

arXiv.org Artificial IntelligenceApr-29-2025

Large Language Models (LLMs) are commonly criticized for not "understanding" language. However, many critiques focus on cognitive abilities that, in humans, are distinct from language processing. Here, we instead study a kind of understanding tightly linked to language: inferring "who did what to whom" (thematic roles) in a sentence. Does the central training objective of LLMs--word prediction--result in sentence representations that capture thematic roles? In two experiments, we characterized sentence representations in four LLMs. In contrast to human similarity judgments, in LLMs the overall representational similarity of sentence pairs reflected syntactic similarity but not whether their agent and patient assignments were identical vs. reversed. Furthermore, we found little evidence that thematic role information was available in any subset of hidden units. However, some attention heads robustly captured thematic roles, independently of syntax. Therefore, LLMs can extract thematic roles but, relative to humans, this information influences their representations more weakly.

large language model, machine learning, thematic role assignment, (20 more...)

arXiv.org Artificial Intelligence

2504.16884

Country: North America > United States (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Analyzing the Inner Workings of Transformers in Compositional Generalization

Kumon, Ryoma, Yanaka, Hitomi

arXiv.org Artificial IntelligenceFeb-21-2025

The compositional generalization abilities of neural models have been sought after for human-like linguistic competence. The popular method to evaluate such abilities is to assess the models' input-output behavior. However, that does not reveal the internal mechanisms, and the underlying competence of such models in compositional generalization remains unclear. To address this problem, we explore the inner workings of a Transformer model by finding an existing subnetwork that contributes to the generalization performance and by performing causal analyses on how the model utilizes syntactic features. We find that the model depends on syntactic features to output the correct answer, but that the subnetwork with much better generalization performance than the whole model relies on a non-compositional algorithm in addition to the syntactic features. We also show that the subnetwork improves its generalization performance relatively slowly during the training compared to the in-distribution one, and the non-compositional solution is acquired in the early stages of the training.

computational linguistic, generalization, subnetwork, (13 more...)

arXiv.org Artificial Intelligence

2502.15277

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
(7 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Detection of LLM-Generated Java Code Using Discretized Nested Bigrams

Paek, Timothy, Mohan, Chilukuri

arXiv.org Artificial IntelligenceFeb-7-2025

Large Language Models (LLMs) are currently used extensively to generate code by professionals and students, motivating the development of tools to detect LLM-generated code for applications such as academic integrity and cybersecurity. We address this authorship attribution problem as a binary classification task along with feature identification and extraction. We propose new Discretized Nested Bigram Frequency features on source code groups of various sizes. Compared to prior work, improvements are obtained by representing sparse information in dense membership bins. Experimental evaluation demonstrated that our approach significantly outperformed a commonly used GPT code-detection API and baseline features, with accuracy exceeding 96% compared to 72% and 79% respectively in detecting GPT-rewritten Java code fragments for 976 files with GPT 3.5 and GPT4 using 12 features. We also outperformed three prior works on code author identification in a 40-author dataset. Our approach scales well to larger data sets, and we achieved 99% accuracy and 0.999 AUC for 76,089 files and over 1,000 authors with GPT 4o using 227 features.

accuracy, dataset, ewd-nb-f, (12 more...)

arXiv.org Artificial Intelligence

2502.1574

Country: North America > United States > New York > Onondaga County > Syracuse (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (0.34)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

What is a word?

Murphy, Elliot

arXiv.org Artificial IntelligenceFeb-19-2024

"Despite 2,400 years or so of trying, it is unclear that anyone has ever come up with an adequate definition of any word whatsoever, even the simplest." Surprisingly few linguists and philosophers have a clear model of what a word is, even though words impact basically every aspect of human life. Researchers that regularly publish academic papers about language often rely on outdated, or inaccurate, assumptions about wordhood. As in all scientific disciplines, we have two notions to consider: 1. Our intuitive concept of'word' (which we all have, even though it can be vague, and sometimes hard to articulate fully, like most complex concepts). This is no different from other scientific concepts - for example, 'water' has a very intuitive meaning, but it also is linked to much more technical, formal notions emerging from chemistry and physics (Murphy 2023). This short pedagogical document outlines what the lexicon is most certainly not (though is often mistakenly taken to be), what it might be (based on current good theories), and what some implications for experimental design are. The central features of lexical items have no connection with sensorimotor instructions.

instruction, lexicon, syntax, (16 more...)

arXiv.org Artificial Intelligence

2402.12605

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
North America > United States > Texas > Harris County > Houston (0.04)
(2 more...)

Genre: Research Report (0.70)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.49)

Add feedback

Turkish Native Language Identification

Uluslu, Ahmet Yavuz, Schneider, Gerold

arXiv.org Artificial IntelligenceNov-4-2023

In this paper, we present the first application of Native Language Identification (NLI) for the Turkish language. NLI involves predicting the writer's first language by analysing their writing in different languages. While most NLI research has focused on English, our study extends its scope to Turkish. We used the recently constructed Turkish Learner Corpus and employed a combination of three syntactic features (CFG production rules, part-of-speech n-grams, and function words) with L2 texts to demonstrate their effectiveness in this task.

identification, language identification, native language identification, (12 more...)

arXiv.org Artificial Intelligence

2307.1485

Country:

Europe > Switzerland > Zürich > Zürich (0.05)
South America > Brazil (0.04)
Asia > Middle East > Republic of Türkiye (0.04)
Asia > Afghanistan (0.04)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.35)

Add feedback

Syntactic Structure Processing in the Brain while Listening

Oota, Subba Reddy, Marreddy, Mounika, Gupta, Manish, Surampud, Bapi Raju

arXiv.org Artificial IntelligenceFeb-16-2023

Syntactic parsing is the task of assigning a syntactic structure to a sentence. There are two popular syntactic parsing methods: constituency and dependency parsing. Recent works have used syntactic embeddings based on constituency trees, incremental top-down parsing, and other word syntactic features for brain activity prediction given the text stimuli to study how the syntax structure is represented in the brain's language network. However, the effectiveness of dependency parse trees or the relative predictive power of the various syntax parsers across brain areas, especially for the listening task, is yet unexplored. In this study, we investigate the predictive power of the brain encoding models in three settings: (i) individual performance of the constituency and dependency syntactic parsing based embedding methods, (ii) efficacy of these syntactic parsing based embedding methods when controlling for basic syntactic signals, (iii) relative effectiveness of each of the syntactic embedding methods when controlling for the other. Further, we explore the relative importance of syntactic information (from these syntactic embedding methods) versus semantic information using BERT embeddings. We find that constituency parsers help explain activations in the temporal lobe and middle-frontal gyrus, while dependency parsers better encode syntactic structure in the angular gyrus and posterior cingulate cortex. Although semantic signals from BERT are more effective compared to any of the syntactic features or embedding methods, syntactic embedding methods explain additional variance for a few brain regions.

artificial intelligence, hemisphere, natural language, (16 more...)

arXiv.org Artificial Intelligence

2302.08589

Country:

South America > Brazil (0.04)
North America > United States > New York > Bronx County > New York City (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Tapping the Potential of Coherence and Syntactic Features in Neural Models for Automatic Essay Scoring

Qiu, Xinying, Liao, Shuxuan, Xie, Jiajun, Nie, Jian-Yun

arXiv.org Artificial IntelligenceNov-23-2022

In the prompt-specific holistic score prediction task for Automatic Essay Scoring, the general approaches include pre-trained neural model, coherence model, and hybrid model that incorporate syntactic features with neural model. In this paper, we propose a novel approach to extract and represent essay coherence features with prompt-learning NSP that shows to match the state-of-the-art AES coherence model, and achieves the best performance for long essays. We apply syntactic feature dense embedding to augment BERT-based model and achieve the best performance for hybrid methodology for AES. In addition, we explore various ideas to combine coherence, syntactic information and semantic embeddings, which no previous study has done before. Our combined model also performs better than the SOTA available for combined model, even though it does not outperform our syntactic enhanced neural model. We further offer analyses that can be useful for future study.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2211.13373

Country:

Asia > China > Guangdong Province > Guangzhou (0.05)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report > Promising Solution (0.49)

Industry: Education > Assessment & Standards > Student Performance (0.75)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Syntactic structures and the general Markov models

Gakkhar, Sitanshu, Marcolli, Matilde

arXiv.org Artificial IntelligenceOct-18-2022

The focus of the present paper is to investigate the following questions: to what extent syntactic features capture phylogenetic relationships and to what extent Markov models are a viable assumption for phylogenetic reconstruction based on syntactic features. For the second, we also consider an alternative that we argue approximates the infinite site evolutionary model. These questions are motivated by the fact that at both lexical and syntactic level, Markov processes are commonly assumed to underlie computational models of language change; for instance, within the Principles and Parameters setting relevant here, Niyogi and Berwick (1997) developed models of language acquisition and language change based on a Markov process in a space of syntactic parameters. In this paper we focus only on language change processes, viewed through the lens of phylogenetic trees of language families. While the model we consider are not directly related to models of language acquisition and parameter setting, the historical changes of syntax within and across language families, through the modification of syntactic parameters, can be seen as an effect of such underlying dynamics.

machine learning, markov model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2104.08462

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Calabria (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(7 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback