AITopics | lexical richness

Collaborating Authors

lexical richness

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Show or Tell? Modeling the evolution of request-making in Human-LLM conversations

Zhu, Shengqi, Rzeszotarski, Jeffrey M., Mimno, David

arXiv.org Artificial IntelligenceOct-9-2025

Designing user-centered LLM systems requires understanding how people use them, but patterns of user behavior are often masked by the variability of queries. In this work, we introduce a new framework to describe request-making that segments user input into request content, roles assigned, query-specific context, and the remaining task-independent expressions. We apply the workflow to create and analyze a dataset of 211k real-world queries based on WildChat. Compared with similar human-human setups, we find significant differences in the language for request-making in the human-LLM scenario. Further, we introduce a novel and essential perspective of diachronic analyses with user expressions, which reveals fundamental and habitual user-LLM interaction patterns beyond individual task completion. We find that query patterns evolve from early ones emphasizing sole requests to combining more context later on, and individual users explore expression patterns but tend to converge with more experience. From there, we propose to understand communal trends of expressions underlying distinct tasks and discuss the preliminary findings. Finally, we discuss the key implications for user studies, computational pragmatics, and LLM alignment.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.01213

Country: North America > United States (0.68)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Can Large Language Models (LLMs) Describe Pictures Like Children? A Comparative Corpus Study

Woloszyn, Hanna, Gagl, Benjamin

arXiv.org Artificial IntelligenceAug-20-2025

The role of large language models (LLMs) in education is increasing, yet little attention has been paid to whether LLM-generated text resembles child language. This study evaluates how LLMs replicate child-like language by comparing LLM-generated texts to a collection of German children's descriptions of picture stories. We generated two LLM-based corpora using the same picture stories and two prompt types: zero-shot and few-shot prompts specifying a general age from the children corpus. We conducted a comparative analysis across psycholinguistic text properties, including word frequency, lexical richness, sentence and word length, part-of-speech tags, and semantic similarity with word embeddings. The results show that LLM-generated texts are longer but less lexically rich, rely more on high-frequency words, and under-represent nouns. Semantic vector space analysis revealed low similarity, highlighting differences between the two corpora on the level of corpus semantics. Few-shot prompt increased similarities between children and LLM text to a minor extent, but still failed to replicate lexical and semantic patterns. The findings contribute to our understanding of how LLMs approximate child language through multimodal prompting (text + image) and give insights into their use in psycholinguistic research and education while raising important questions about the appropriateness of LLM-generated language in child-directed educational tools.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.13769

Country: Europe (0.46)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area (0.47)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Paraphrase Types Elicit Prompt Engineering Capabilities

Wahle, Jan Philip, Ruas, Terry, Xu, Yang, Gipp, Bela

arXiv.org Artificial IntelligenceJun-28-2024

Much of the success of modern language models depends on finding a suitable prompt to instruct the model. Until now, it has been largely unknown how variations in the linguistic expression of prompts affect these models. This study systematically and empirically evaluates which linguistic features influence models through paraphrase types, i.e., different linguistic changes at particular positions. We measure behavioral changes for five models across 120 tasks and six families of paraphrases (i.e., morphology, syntax, lexicon, lexico-syntax, discourse, and others). We also control for other prompt engineering factors (e.g., prompt length, lexical diversity, and proximity to training data). Our results show a potential for language models to improve tasks when their prompts are adapted in specific paraphrase types (e.g., 6.7% median gain in Mixtral 8x7B; 5.5% in LLaMA 3 8B). In particular, changes in morphology and lexicon, i.e., the vocabulary used, showed promise in improving prompts. These findings contribute to developing more robust language models capable of handling variability in linguistic expression.

computational linguistic, language model, performance gain, (13 more...)

arXiv.org Artificial Intelligence

2406.19898

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Lower Saxony > Gottingen (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(8 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Beware of Words: Evaluating the Lexical Richness of Conversational Large Language Models

Martínez, Gonzalo, Hernández, José Alberto, Conde, Javier, Reviriego, Pedro, Merino, Elena

arXiv.org Artificial IntelligenceFeb-11-2024

The performance of conversational Large Language Models (LLMs) in general, and of ChatGPT in particular, is currently being evaluated on many different tasks, from logical reasoning or maths to answering questions on a myriad of topics. Instead, much less attention is being devoted to the study of the linguistic features of the texts generated by these LLMs. This is surprising since LLMs are models for language, and understanding how they use the language is important. Indeed, conversational LLMs are poised to have a significant impact on the evolution of languages as they may eventually dominate the creation of new text. This means that for example, if conversational LLMs do not use a word it may become less and less frequent and eventually stop being used altogether. Therefore, evaluating the linguistic features of the text they produce and how those depend on the model parameters is the first step toward understanding the potential impact of conversational LLMs on the evolution of languages. In this paper, we consider the evaluation of the lexical richness of the text generated by LLMs and how it depends on the model parameters. A methodology is presented and used to conduct a comprehensive evaluation of lexical richness using ChatGPT as a case study. The results show how lexical richness depends on the version of ChatGPT and some of its parameters, such as the presence penalty, or on the role assigned to the model. The dataset and tools used in our analysis are released under open licenses with the goal of drawing the much-needed attention to the evaluation of the linguistic features of LLM-generated text.

evaluation, lexical richness, llm, (12 more...)

arXiv.org Artificial Intelligence

2402.15518

Country:

Europe > Spain > Galicia > Madrid (0.05)
Europe > Spain > Castile and León > Valladolid Province > Valladolid (0.04)
North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Content Reduction, Surprisal and Information Density Estimation for Long Documents

Ji, Shaoxiong, Sun, Wei, Marttinen, Pekka

arXiv.org Artificial IntelligenceSep-12-2023

Many computational linguistic methods have been proposed to study the information content of languages. We consider two interesting research questions: 1) how is information distributed over long documents, and 2) how does content reduction, such as token selection and text summarization, affect the information density in long documents. We present four criteria for information density estimation for long documents, including surprisal, entropy, uniform information density, and lexical density. Among those criteria, the first three adopt the measures from information theory. We propose an attention-based word selection method for clinical notes and study machine summarization for multiple-domain documents. Our findings reveal the systematic difference in information density of long text in various domains. Empirical results on automated medical coding from long clinical notes show the effectiveness of the attention-based word selection method.

dataset, information, information density, (16 more...)

arXiv.org Artificial Intelligence

2309.06009

Country:

Europe > Finland > Uusimaa > Helsinki (0.04)
North America > United States > Oregon (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Health Care Technology > Medical Record (0.90)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Language Analysis of Speakers with Dementia of the Alzheimer’s Type

Guinn, Curry I. (University of North Carolina Wilmington) | Habash, Anthony (University of North Carolina Wilmington)

AAAI ConferencesNov-5-2012

This research is a discriminative analysis of conversational dialogs involving individuals suffering from dementia of Alzheimer’s type. Several metric analyses are applied to the transcripts of the Carolina Conversation Corpus (Pope and Davis 2011) in order to determine if there are significant statistical differences between individuals with and without Alzheimer’s disease. Results from the analysis indicate that go-ahead utterances, certain fluency measures, and paraphrasing provide defensible means of differentiating the linguistic characteristics of spontaneous speech between healthy individuals and those with Alzheimer’s disease. Several machine learning algorithms were used to classify the speech of individuals with and without dementia of the Alzheimer’s type.

alzheimer, artificial intelligence, machine learning, (18 more...)

AAAI Conferences

2012 AAAI Fall Symposium Series

Country:

North America > United States > Texas (0.04)
North America > United States > South Carolina (0.04)
North America > United States > North Carolina > New Hanover County > Wilmington (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.47)

Add feedback