coh-metrix
Long-form analogies generated by chatGPT lack human-like psycholinguistic properties
Seals, S. M., Shalin, Valerie L.
Psycholinguistic analyses provide a means of evaluating large language model (LLM) output and making systematic comparisons to human-generated text. These methods can be used to characterize the psycholinguistic properties of LLM output and illustrate areas where LLMs fall short in comparison to human-generated text. In this work, we apply psycholinguistic methods to evaluate individual sentences from long-form analogies about biochemical concepts. We compare analogies generated by human subjects enrolled in introductory biochemistry courses to analogies generated by chatGPT. We perform a supervised classification analysis using 78 features extracted from Coh-metrix that analyze text cohesion, language, and readability (Graesser et. al., 2004). Results illustrate high performance for classifying student-generated and chatGPT-generated analogies. To evaluate which features contribute most to model performance, we use a hierarchical clustering approach. Results from this analysis illustrate several linguistic differences between the two sources.
Chinese Intermediate English Learners outdid ChatGPT in deep cohesion: Evidence from English narrative writing
Zhou, Tongquan, Cao, Siyi, Zhou, Siruo, Zhang, Yao, He, Aijing
ChatGPT is a publicly available chatbot that can quickly generate texts on given topics, but it is unknown whether the chatbot is really superior to human writers in all aspects of writing and whether its writing quality can be prominently improved on the basis of updating commands. Consequently, this study compared the writing performance on a narrative topic by ChatGPT and Chinese intermediate English (CIE) learners so as to reveal the chatbot's advantage and disadvantage in writing. The data were analyzed in terms of five discourse components using Coh-Metrix (a special instrument for analyzing language discourses), and the results revealed that ChatGPT performed better than human writers in narrativity, word concreteness, and referential cohesion, but worse in syntactic simplicity and deep cohesion in its initial version. After more revision commands were updated, while the resulting version was facilitated in syntactic simplicity, yet it is still lagged far behind CIE learners' writing in deep cohesion. In addition, the correlation analysis of the discourse components suggests that narrativity was correlated with referential cohesion in both ChatGPT and human writers, but the correlations varied within each group.
A Metric Scale for 'Abstractness' of the Word Meaning
Samsonovich, Alexei V. (George Mason University)
Web personalization involves automated content analysis of text, and modern technologies of semantic analysis of text rely on a number of scales. Among them is the abstractness of meaning, which is not captured by more traditional measures of sentiment, such as valence, arousal and dominance. The present work introduces a physics-inspired approach to constructing the abstractness scale based on databases of hypernym-hyponym relations, e.g., WordNet 3.0. The idea is to define an energy as a function of word coordinates that are distributed in one dimension, and then to find a global minimum of this energy function by relocating words in this dimension. The result is a one-dimensional distribution that assigns "abstractness" values to words. While positions of individual words on this scale are subject to noise, the entire distribution globally defines the universal semantic dimension associated with the notion of hypernym-hyponym relations, called here "abstractness".
A Linguistic Analysis of Expert-Generated Paraphrases
Brandon, Russell D. (Arizona State University) | Crossley, Scott A. (Georgia State University) | McNamara, Danielle S. (Arizona State University)
The authors used the computational tool Coh-Metrix to examine expert writersโ paraphrases and in particular, how experts paraphrase text passages using condensing strategies. The overarching goal of this study was to develop machine learning algorithms to aid in the automatic detection of paraphrases and paraphrase types. To this end, three experts were instructed to paraphrase by condensing a set of target passages. The linguistic differences between the original passages and the condensed paraphrases were then analyzed using Coh-Metrix. The condensed paraphrases were accurately distinguished from the original target passages based on the number of words, word frequency, and syntactic complexity.
Number of Words Versus Number Ideas: Finding a Better Predictor of Writing Quality
Weston, Jennifer L. (University of Memphis) | Crossley, Scott A. (Georgia State University) | McCarthy, Philip M. (University of Memphis) | McNamara, Danielle S. (University of Memphis)
This study examines the relation between the linguistic features of freewrites and human assessments of freewriting quality. This study builds upon the authorsโ previous studies in which a model was developed based on the linguistic features of freewrites written by 9th and 11th grade students to predict freewrite quality. The current study reexamines this model using number of propositions as a predictor instead of number of words because the number of propositions was expected to be a better proxy for number of ideas in contrast to simple text length. The results indicated that there were only slight advantages for using a measure for number of propositions, indicating that from an artificial intelligence perspective, the number of words was the better measure.
Automated Assessment of Paragraph Quality: Introduction, Body, and Conclusion Paragraphs
Roscoe, Rod (University of Memphis) | Crossley, Scott (Georgia State University) | Weston, Jennifer (University of Memphis) | McNamara, Danielle (University of Memphis)
Natural language processing and statistical methods were used to identify linguistic features associated with the quality of student-generated paragraphs. Linguistic features were assessed using Coh-Metrix. The resulting computational models demonstrated small to medium effect sizes for predicting paragraph quality: introduction quality r2 = .25, body quality r2 = .10, and conclusion quality r2 = .11. Although the variance explained was somewhat low, the linguistic features identified were consistent with the rhetorical goals of paragraph types. Avenues for bolstering this approach by considering individual writing styles and techniques are considered.
Assessment of LDAT as a Grammatical Diversity Assessment Tool
Healy, Scott Leigh (The University of Memphis) | Weintraub, Joseph D. (The University of Memphis) | McCarthy, Philip M. (The University of Memphis) | Hall, Charles E. (The University of Memphis) | McNamara, Danielle S. (The University of Memphis)
The purpose of this study is to evaluate the validity of measuring grammatical diversity with a specifically designed Lexical Diversity Assessment Tool (LDAT). A secondary objective is to use LDAT to determine if the level of difficulty assigned to English as a Second Language (ESL) texts corresponds to increases in grammatical, lexical, and temporal diversity. Other methods of lexical diversity assessment, such as type-token ratio (TTR), have been used with varying accuracy in an effort to determine the complexity or level of texts. We analyzed 120 ESL texts independently assigned by their sources to one of four levels (Beginner, Lower-intermediate, Upper-intermediate, and Advanced). We demonstrated that LDAT significantly reflected the grammatical diversity within these texts. While the findings conflicted with the prediction that grammatical and lexical diversity would increase with assigned level, we concluded that the implementation of LDAT in text design could provide reliable assessments of grammatical diversity.
Expanding a Catalogue of Deceptive Linguistic Features with NLP Technologies
Duran, Nicholas D. (University of Memphis) | Crossley, Scott A. (Mississippi State University) | Hall, Charles (University of Memphis) | McCarthy, Philip M. (University of Memphis) | McNamara, Danielle S. (University of Memphis)
We evaluate conversational transcripts of deceptive speech using a sophisticated natural language processing tool called Coh-Metrix. Coh-Metrix is unique in that it tracks linguistic features based on social and cognitive factors. The results from Coh-Metrix are compared to linguistic features reported in previous independent deception research, which used a natural language processing tool called LIWC. The comparison provides converging validity for several linguistic features, and establishes new insights on deceptive language.