human text
Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts
Rapidly increasing quality of AI-generated content makes it difficult to distinguish between human and AI-generated texts, which may lead to undesirable consequences for society. Therefore, it becomes increasingly important to study the properties of human texts that are invariant over text domains and various proficiency of human writers, can be easily calculated for any language, and can robustly separate natural and AI-generated texts regardless of the generation model and sampling method. In this work, we propose such an invariant of human texts, namely the intrinsic dimensionality of the manifold underlying the set of embeddings of a given text sample. We show that the average intrinsic dimensionality of fluent texts in natural language is hovering around the value $9$ for several alphabet-based languages and around $7$ for Chinese, while the average intrinsic dimensionality of AI-generated texts for each language is $\approx 1.5$ lower, with a clear statistical separation between human-generated and AI-generated distributions. This property allows us to build a score-based artificial text detector. The proposed detector's accuracy is stable over text domains, generator models, and human writer proficiency levels, outperforming SOTA detectors in model-agnostic and cross-domain scenarios by a significant margin.
- North America > Canada (0.04)
- Oceania > Australia (0.04)
- North America > United States (0.04)
- Asia (0.04)
- North America > United States > Alaska > North Slope Borough > Utqiagvik (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Generation (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
- North America > Canada (0.04)
- Oceania > Australia (0.04)
- North America > United States (0.04)
- Asia (0.04)
- North America > United States > Alaska > North Slope Borough > Utqiagvik (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Evaluating the Creativity of LLMs in Persian Literary Text Generation
Tourajmehr, Armin, Modarres, Mohammad Reza, Yaghoobzadeh, Yadollah
Large language models (LLMs) have demonstrated notable creative abilities in generating literary texts, including poetry and short stories. However, prior research has primarily centered on English, with limited exploration of non-English literary traditions and without standardized methods for assessing creativity. In this paper, we evaluate the capacity of LLMs to generate Persian literary text enriched with culturally relevant expressions. We build a dataset of user-generated Persian literary spanning 20 diverse topics and assess model outputs along four creativity dimensions-originality, fluency, flexibility, and elaboration-by adapting the Torrance Tests of Creative Thinking. To reduce evaluation costs, we adopt an LLM as a judge for automated scoring and validate its reliability against human judgments using intraclass correlation coefficients, observing strong agreement. In addition, we analyze the models' ability to understand and employ four core literary devices: simile, metaphor, hyperbole, and antithesis. Our results highlight both the strengths and limitations of LLMs in Persian literary text generation, underscoring the need for further refinement.
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
ChatGPT-generated texts show authorship traits that identify them as non-human
Dentella, Vittoria, Huang, Weihang, Mansi, Silvia Angela, Grieve, Jack, Leivada, Evelina
Large Language Models can emulate different writing styles, ranging from composing poetry that appears indistinguishable from that of famous poets to using slan g that can convince people that they are chatting with a human online . While differences in style may not always be visible to the untrained eye, we can generally distinguish the writing of different people, like a linguistic fingerprint. This work examines whether a language model can also be linked to a specific fingerprint . Through stylometric and multidimensional register analys e s, w e compare human - authored and model - authored texts from different registers. We find that the model can successfully adapt its style depending on whether it is prompted to produce a Wikipedia entry vs. a college essay, but not in a way that makes it indistinguishable from human s . Concretely, the model shows more limited variation when producing outputs in different registers. O ur results suggest that the model prefers nouns to verbs, thus showing a distinct linguistic backbone from humans, who tend to anchor language in the highly grammaticalized dimensions of tense, aspect, and mood . It is possible that the more complex domains of grammar reflect a mode of thought unique to humans, thus acting as a litmus test for Artificial Intelligence. 2 Introduction Scholars from different disciplines have been addressing the question of what makes us human for centuries. For Nobel laureate Bertrand Russell, the answer is language, for "no matter how eloquently a dog may bark, he cannot tell you that his parents were poor but honest". H uman language is both flexible and constrained at the same time, and this is why the Turing Test, described as a litmus test for Artificial Intelligence [ Shieber 199 4, French 200 0], is linked to achieving a level of conversational proficiency that is highly complex, akin to that of a human [ Turing 1950 ] . Human language is flexible in the sense that we all make different choices when conversing. Every human is thought t o have a distinct linguistic fingerprint called idiolect [ Halliday et al. 196 4, Coulthard 2004 ] . This idiolect, which can be defined as an individual's unique use of linguistic forms (including lexical choices, collocations and fixed expressions, punctuation patterns, misspellings, and grammatical style), is critical for authorship attribution in a range of situations: from identifying that a poem with dashes, elliptical syntax, and unconventional capitalization is more likely authored by Emily Dickinson and not by William Shakespeare, to pinning down a person of interest in the course of a criminal investigation, as happened in the Unabomber case .
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- Europe > Italy (0.04)
- North America > United States > Nebraska (0.04)
- (6 more...)
- Law (0.48)
- Health & Medicine (0.46)
DivScore: Zero-Shot Detection of LLM-Generated Text in Specialized Domains
Chen, Zhihui, He, Kai, Huang, Yucheng, Zhu, Yunxiao, Feng, Mengling
Detecting LLM-generated text in specialized and high-stakes domains like medicine and law is crucial for combating misinformation and ensuring authenticity. However, current zero-shot detectors, while effective on general text, often fail when applied to specialized content due to domain shift. We provide a theoretical analysis showing this failure is fundamentally linked to the KL divergence between human, detector, and source text distributions. To address this, we propose DivScore, a zero-shot detection framework using normalized entropy-based scoring and domain knowledge distillation to robustly identify LLM-generated text in specialized domains. We also release a domain-specific benchmark for LLM-generated text detection in the medical and legal domains. Experiments on our benchmark show that DivScore consistently outperforms state-of-the-art detectors, with 14.4% higher AUROC and 64.0% higher recall (0.1% false positive rate threshold). In adversarial settings, DivScore demonstrates superior robustness than other baselines, achieving on average 22.8% advantage in AUROC and 29.5% in recall. Code and data are publicly available.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > Pennsylvania (0.04)
- North America > United States > Texas (0.04)
- (10 more...)
- Law (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
- Government (1.00)
- (5 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
The Arabic AI Fingerprint: Stylometric Analysis and Detection of Large Language Models Text
Al-Shaibani, Maged S., Ahmed, Moataz
Large Language Models (LLMs) have achieved unprecedented capabilities in generating human-like text, posing subtle yet significant challenges for information integrity across critical domains, including education, social media, and academia, enabling sophisticated misinformation campaigns, compromising healthcare guidance, and facilitating targeted propaganda. This challenge becomes severe, particularly in under-explored and low-resource languages like Arabic. This paper presents a comprehensive investigation of Arabic machine-generated text, examining multiple generation strategies (generation from the title only, content-aware generation, and text refinement) across diverse model architectures (ALLaM, Jais, Llama, and GPT-4) in academic, and social media domains. Our stylometric analysis reveals distinctive linguistic patterns differentiating human-written from machine-generated Arabic text across these varied contexts. Despite their human-like qualities, we demonstrate that LLMs produce detectable signatures in their Arabic outputs, with domain-specific characteristics that vary significantly between different contexts. Based on these insights, we developed BERT-based detection models that achieved exceptional performance in formal contexts (up to 99.9\% F1-score) with strong precision across model architectures. Our cross-domain analysis confirms generalization challenges previously reported in the literature. To the best of our knowledge, this work represents the most comprehensive investigation of Arabic machine-generated text to date, uniquely combining multiple prompt generation methods, diverse model architectures, and in-depth stylometric analysis across varied textual domains, establishing a foundation for developing robust, linguistically-informed detection systems essential for preserving information integrity in Arabic-language contexts.
The Reader is the Metric: How Textual Features and Reader Profiles Explain Conflicting Evaluations of AI Creative Writing
Marco, Guillermo, Gonzalo, Julio, Fresno, Víctor
Recent studies comparing AI-generated and human-authored literary texts have produced conflicting results: some suggest AI already surpasses human quality, while others argue it still falls short. We start from the hypothesis that such divergences can be largely explained by genuine differences in how readers interpret and value literature, rather than by an intrinsic quality of the texts evaluated. Using five public datasets (1,471 stories, 101 annotators including critics, students, and lay readers), we (i) extract 17 reference-less textual features (e.g., coherence, emotional variance, average sentence length...); (ii) model individual reader preferences, deriving feature importance vectors that reflect their textual priorities; and (iii) analyze these vectors in a shared "preference space". Reader vectors cluster into two profiles: 'surface-focused readers' (mainly non-experts), who prioritize readability and textual richness; and 'holistic readers' (mainly experts), who value thematic development, rhetorical variety, and sentiment dynamics. Our results quantitatively explain how measurements of literary quality are a function of how text features align with each reader's preferences. These findings advocate for reader-sensitive evaluation frameworks in the field of creative text generation.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe > Spain > Galicia > Madrid (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (6 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)