Goto

Collaborating Authors

 Bensenville


A Survey on Large Language Model Hallucination via a Creativity Perspective

arXiv.org Artificial Intelligence

Hallucinations in large language models (LLMs) are always seen as limitations. However, could they also be a source of creativity? This survey explores this possibility, suggesting that hallucinations may contribute to LLM application by fostering creativity. This survey begins with a review of the taxonomy of hallucinations and their negative impact on LLM reliability in critical applications. Then, through historical examples and recent relevant theories, the survey explores the potential creative benefits of hallucinations in LLMs. To elucidate the value and evaluation criteria of this connection, we delve into the definitions and assessment methods of creativity. Following the framework of divergent and convergent thinking phases, the survey systematically reviews the literature on transforming and harnessing hallucinations for creativity in LLMs. Finally, the survey discusses future research directions, emphasizing the need to further explore and refine the application of hallucinations in creative processes within LLMs.


Automatic Assessment of Divergent Thinking in Chinese Language with TransDis: A Transformer-Based Language Model Approach

arXiv.org Artificial Intelligence

Language models have been increasingly popular for automatic creativity assessment, generating semantic distances to objectively measure the quality of creative ideas. However, there is currently a lack of an automatic assessment system for evaluating creative ideas in the Chinese language. To address this gap, we developed TransDis, a scoring system using transformer-based language models, capable of providing valid originality (quality) and flexibility (variety) scores for Alternative Uses Task (AUT) responses in Chinese. Study 1 demonstrated that the latent model-rated originality factor, comprised of three transformer-based models, strongly predicted human originality ratings, and the model-rated flexibility strongly correlated with human flexibility ratings as well. Criterion validity analyses indicated that model-rated originality and flexibility positively correlated to other creativity measures, demonstrating similar validity to human ratings. Study 2 & 3 showed that TransDis effectively distinguished participants instructed to provide creative vs. common uses (Study 2) and participants instructed to generate ideas in a flexible vs. persistent way (Study 3). Our findings suggest that TransDis can be a reliable and low-cost tool for measuring idea originality and flexibility in Chinese language, potentially paving the way for automatic creativity assessment in other languages. We offer an open platform to compute originality and flexibility for AUT responses in Chinese and over 50 other languages (https://osf.io/59jv2/).


Toward an Intelligent Agent for Fraud Detection — The CFE Agent

AAAI Conferences

One of the primary realms into which artificial intelligence research has ventured is that of psychometric tests. It has been debated since Alan Turing proposed the Turing Test whether performance on tests should serve as the metric by which we should determine whether a machine is intelligent. This is an idea that may either solidify or challenge, depending on the reader's predisposition, one's sense of what artificial intelligence really is. As will be discussed in this paper, there is a history of efforts to create agents that perform well on tests in the spirit of an interpretation of artificial intelligence called ``Psychometric AI''. However, the focus of this paper is to describe a machine agent, hereafter called the CFE Agent, developed in this tradition. The CFE Exam is a gateway to certification in the Association of Certified Fraud Examiners (ACFE), a widely recognized professional credential within the fraud examiner profession. The CFE Agent attempts to emulate the successful performance of a human test taker, using what would appear to be simplistic natural language processing approaches to answer test questions. But it is also hoped that the the reader will be convinced that the same core technologies can be successfully applied within the larger domain of fraud detection. Further work will also be briefly discussed, in which we attempt to take these techniques to the next level, a deeper level, by which we can get a better sense of the knowledge the agent is using, and how that knowledge is being applied to formulate answers.