Generative AI
Deepfakes and Higher Education: A Research Agenda and Scoping Review of Synthetic Media
The pace of the development of Artificial Intelligence (AI) technologies has led to significant concern in many areas of society, including educational contexts. As a result, research agendas on Generative AI (GenAI) in tertiary education have been established (Lodge et al., 2023); however, to date, no review or research agenda has specifically focused on deepfakes in tertiary education. Deepfakes are GenAI outputs which comprise realistic audio, visual, or media outputs that depict false or inaccurate information (Akhtar, 2023). The major consequence of deepfakes is that they can portray an individual doing something or saying something that they have never done, marking an unprecedented shift in the ability to distort reality (Appel & Prietzel, 2022). As tertiary education institutions are centres of learning, the potential implications of such false information are highly important for students, teachers, and university leadership, thus warranting stakeholder attention.
It's the End of the Web as We Know It
The web has become so interwoven with everyday life that it is easy to forget what an extraordinary accomplishment and treasure it is. In just a few decades, much of human knowledge has been collectively written up and made available to anyone with an internet connection. But all of this is coming to an end. The advent of AI threatens to destroy the complex online ecosystem that allows writers, artists, and other creators to reach human audiences. To understand why, you must understand publishing.
In the Shadow of Smith`s Invisible Hand: Risks to Economic Stability and Social Wellbeing in the Age of Intelligence
Occhipinti, Jo-An, Hynes, William, Prodan, Ante, Eyre, Harris A., Green, Roy, Burrow, Sharan, Tanner, Marcel, Buchanan, John, Ujdur, Goran, Destrebecq, Frederic, Song, Christine, Carnevale, Steven, Hickie, Ian B., Heffernan, Mark
Work is fundamental to societal prosperity and mental health, providing financial security, identity, purpose, and social integration. The emergence of generative artificial intelligence (AI) has catalysed debate on job displacement. Some argue that many new jobs and industries will emerge to offset the displacement, while others foresee a widespread decoupling of economic productivity from human input threatening jobs on an unprecedented scale. This study explores the conditions under which both may be true and examines the potential for a self-reinforcing cycle of recessionary pressures that would necessitate sustained government intervention to maintain job security and economic stability. A system dynamics model was developed to undertake ex ante analysis of the effect of AI-capital deepening on labour underutilisation and demand in the economy. Results indicate that even a moderate increase in the AI-capital-to-labour ratio could increase labour underutilisation to double its current level, decrease per capita disposable income by 26% (95% interval, 20.6% - 31.8%), and decrease the consumption index by 21% (95% interval, 13.6% - 28.3%) by mid-2050. To prevent a reduction in per capita disposable income due to the estimated increase in underutilization, at least a 10.8-fold increase in the new job creation rate would be necessary. Results demonstrate the feasibility of an AI-capital- to-labour ratio threshold beyond which even high rates of new job creation cannot prevent declines in consumption. The precise threshold will vary across economies, emphasizing the urgent need for empirical research tailored to specific contexts. This study underscores the need for governments, civic organisations, and business to work together to ensure a smooth transition to an AI- dominated economy to safeguard the Mental Wealth of nations.
U Can't Gen This? A Survey of Intellectual Property Protection Methods for Data in Generative AI
Šarčević, Tanja, Karlowicz, Alicja, Mayer, Rudolf, Baeza-Yates, Ricardo, Rauber, Andreas
Large Generative AI (GAI) models have the unparalleled ability to generate text, images, audio, and other forms of media that are increasingly indistinguishable from human-generated content. As these models often train on publicly available data, including copyrighted materials, art and other creative works, they inadvertently risk violating copyright and misappropriation of intellectual property (IP). Due to the rapid development of generative AI technology and pressing ethical considerations from stakeholders, protective mechanisms and techniques are emerging at a high pace but lack systematisation. In this paper, we study the concerns regarding the intellectual property rights of training data and specifically focus on the properties of generative models that enable misuse leading to potential IP violations. Then we propose a taxonomy that leads to a systematic review of technical solutions for safeguarding the data from intellectual property violations in GAI.
How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO
Ng, Man Tik, Tse, Hui Tung, Huang, Jen-tse, Li, Jingjing, Wang, Wenxuan, Lyu, Michael R.
The role-play ability of Large Language Models (LLMs) has emerged as a popular research direction. However, existing studies focus on imitating well-known public figures or fictional characters, overlooking the potential for simulating ordinary individuals. Such an oversight limits the potential for advancements in digital human clones and non-player characters in video games. To bridge this gap, we introduce ECHO, an evaluative framework inspired by the Turing test. This framework engages the acquaintances of the target individuals to distinguish between human and machine-generated responses. Notably, our framework focuses on emulating average individuals rather than historical or fictional figures, presenting a unique advantage to apply the Turing Test. We evaluated three role-playing LLMs using ECHO, with GPT-3.5 and GPT-4 serving as foundational models, alongside the online application GPTs from OpenAI. Our results demonstrate that GPT-4 more effectively deceives human evaluators, and GPTs achieves a leading success rate of 48.3%. Furthermore, we investigated whether LLMs could discern between human-generated and machine-generated texts. While GPT-4 can identify differences, it could not determine which texts were human-produced. Our code and results of reproducing the role-playing LLMs are made publicly available via https://github.com/CUHK-ARISE/ECHO.
Pixels and Predictions: Potential of GPT-4V in Meteorological Imagery Analysis and Forecast Communication
Lawson, John R., Flora, Montgomery L., Goebbert, Kevin H., Lyman, Seth N., Potvin, Corey K., Schultz, David M., Stepanek, Adam J., Trujillo-Falcón, Joseph E.
Generative AI, such as OpenAI's GPT-4V large-language model, has rapidly entered mainstream discourse. Novel capabilities in image processing and natural-language communication may augment existing forecasting methods. Large language models further display potential to better communicate weather hazards in a style honed for diverse communities and different languages. This study evaluates GPT-4V's ability to interpret meteorological charts and communicate weather hazards appropriately to the user, despite challenges of hallucinations, where generative AI delivers coherent, confident, but incorrect responses. We assess GPT-4V's competence via its web interface ChatGPT in two tasks: (1) generating a severe-weather outlook from weather-chart analysis and conducting self-evaluation, revealing an outlook that corresponds well with a Storm Prediction Center human-issued forecast; and (2) producing hazard summaries in Spanish and English from weather charts. Responses in Spanish, however, resemble direct (not idiomatic) translations from English to Spanish, yielding poorly translated summaries that lose critical idiomatic precision required for optimal communication. Our findings advocate for cautious integration of tools like GPT-4V in meteorology, underscoring the necessity of human oversight and development of trustworthy, explainable AI.
AI-Generated Faces in the Real World: A Large-Scale Case Study of Twitter Profile Images
Ricker, Jonas, Assenmacher, Dennis, Holz, Thorsten, Fischer, Asja, Quiring, Erwin
Recent advances in the field of generative artificial intelligence (AI) have blurred the lines between authentic and machine-generated content, making it almost impossible for humans to distinguish between such media. One notable consequence is the use of AI-generated images for fake profiles on social media. While several types of disinformation campaigns and similar incidents have been reported in the past, a systematic analysis has been lacking. In this work, we conduct the first large-scale investigation of the prevalence of AI-generated profile pictures on Twitter. We tackle the challenges of a real-world measurement study by carefully integrating various data sources and designing a multi-stage detection pipeline. Our analysis of nearly 15 million Twitter profile pictures shows that 0.052% were artificially generated, confirming their notable presence on the platform. We comprehensively examine the characteristics of these accounts and their tweet content, and uncover patterns of coordinated inauthentic behavior. The results also reveal several motives, including spamming and political amplification campaigns. Our research reaffirms the need for effective detection and mitigation strategies to cope with the potential negative effects of generative AI in the future.
The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking
Li, Yuying, Liu, Zeyan, Zhao, Junyi, Ren, Liangqin, Li, Fengjun, Luo, Jiebo, Luo, Bo
Generative AI models can produce high-quality images based on text prompts. The generated images often appear indistinguishable from images generated by conventional optical photography devices or created by human artists (i.e., real images). While the outstanding performance of such generative models is generally well received, security concerns arise. For instance, such image generators could be used to facilitate fraud or scam schemes, generate and spread misinformation, or produce fabricated artworks. In this paper, we present a systematic attempt at understanding and detecting AI-generated images (AI-art) in adversarial scenarios. First, we collect and share a dataset of real images and their corresponding artificial counterparts generated by four popular AI image generators. The dataset, named ARIA, contains over 140K images in five categories: artworks (painting), social media images, news photos, disaster scenes, and anime pictures. This dataset can be used as a foundation to support future research on adversarial AI-art. Next, we present a user study that employs the ARIA dataset to evaluate if real-world users can distinguish with or without reference images. In a benchmarking study, we further evaluate if state-of-the-art open-source and commercial AI image detectors can effectively identify the images in the ARIA dataset. Finally, we present a ResNet-50 classifier and evaluate its accuracy and transferability on the ARIA dataset.
Towards Better Text-to-Image Generation Alignment via Attention Modulation
Wu, Yihang, Cao, Xiao, Li, Kaixin, Chen, Zitan, Wang, Haonan, Meng, Lei, Huang, Zhiyong
In text-to-image generation tasks, the advancements of diffusion models have facilitated the fidelity of generated results. However, these models encounter challenges when processing text prompts containing multiple entities and attributes. The uneven distribution of attention results in the issues of entity leakage and attribute misalignment. Training from scratch to address this issue requires numerous labeled data and is resource-consuming. Motivated by this, we propose an attribution-focusing mechanism, a training-free phase-wise mechanism by modulation of attention for diffusion model. One of our core ideas is to guide the model to concentrate on the corresponding syntactic components of the prompt at distinct timesteps. To achieve this, we incorporate a temperature control mechanism within the early phases of the self-attention modules to mitigate entity leakage issues. An object-focused masking scheme and a phase-wise dynamic weight control mechanism are integrated into the cross-attention modules, enabling the model to discern the affiliation of semantic information between entities more effectively. The experimental results in various alignment scenarios demonstrate that our model attain better image-text alignment with minimal additional computational cost.
Iteratively Prompting Multimodal LLMs to Reproduce Natural and AI-Generated Images
Naseh, Ali, Thai, Katherine, Iyyer, Mohit, Houmansadr, Amir
With the digital imagery landscape rapidly evolving, image stocks and AI-generated image marketplaces have become central to visual media. Traditional stock images now exist alongside innovative platforms that trade in prompts for AI-generated visuals, driven by sophisticated APIs like DALL-E 3 and Midjourney. This paper studies the possibility of employing multi-modal models with enhanced visual understanding to mimic the outputs of these platforms, introducing an original attack strategy. Our method leverages fine-tuned CLIP models, a multi-label classifier, and the descriptive capabilities of GPT-4V to create prompts that generate images similar to those available in marketplaces and from premium stock image providers, yet at a markedly lower expense. In presenting this strategy, we aim to spotlight a new class of economic and security considerations within the realm of digital imagery. Our findings, supported by both automated metrics and human assessment, reveal that comparable visual content can be produced for a fraction of the prevailing market prices ($0.23 - $0.27 per image), emphasizing the need for awareness and strategic discussions about the integrity of digital media in an increasingly AI-integrated landscape. Our work also contributes to the field by assembling a dataset consisting of approximately 19 million prompt-image pairs generated by the popular Midjourney platform, which we plan to release publicly.