Generative AI
A linguistic analysis of undesirable outcomes in the era of generative AI
Gambetta, Daniele, Gezici, Gizem, Giannotti, Fosca, Pedreschi, Dino, Knott, Alistair, Pappalardo, Luca
Recent research has focused on the medium and long-term impacts of generative AI, posing scientific and societal challenges mainly due to the detection and reliability of machine-generated information, which is projected to form the major content on the Web soon. Prior studies show that LLMs exhibit a lower performance in generation tasks (model collapse) as they undergo a fine-tuning process across multiple generations on their own generated content (self-consuming loop). In this paper, we present a comprehensive simulation framework built upon the chat version of LLama2, focusing particularly on the linguistic aspects of the generated content, which has not been fully examined in existing studies. Our results show that the model produces less lexical rich content across generations, reducing diversity. The lexical richness has been measured using the linguistic measures of entropy and TTR as well as calculating the POSTags frequency. The generated content has also been examined with an $n$-gram analysis, which takes into account the word order, and semantic networks, which consider the relation between different words. These findings suggest that the model collapse occurs not only by decreasing the content diversity but also by distorting the underlying linguistic patterns of the generated text, which both highlight the critical importance of carefully choosing and curating the initial input text, which can alleviate the model collapse problem. Furthermore, we conduct a qualitative analysis of the fine-tuned models of the pipeline to compare their performances on generic NLP tasks to the original model. We find that autophagy transforms the initial model into a more creative, doubtful and confused one, which might provide inaccurate answers and include conspiracy theories in the model responses, spreading false and biased information on the Web.
Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization
Wang, Xingqi, Yi, Xiaoyuan, Xie, Xing, Jia, Jia
Recent advancements in diffusion models trained on large-scale data have enabled the generation of indistinguishable human-level images, yet they often produce harmful content misaligned with human values, e.g., social bias, and offensive content. Despite extensive research on Large Language Models (LLMs), the challenge of Text-to-Image (T2I) model alignment remains largely unexplored. Addressing this problem, we propose LiVO (Lightweight Value Optimization), a novel lightweight method for aligning T2I models with human values. LiVO only optimizes a plug-and-play value encoder to integrate a specified value principle with the input prompt, allowing the control of generated images over both semantics and values. Specifically, we design a diffusion model-tailored preference optimization loss, which theoretically approximates the Bradley-Terry model used in LLM alignment but provides a more flexible trade-off between image quality and value conformity. To optimize the value encoder, we also develop a framework to automatically construct a text-image preference dataset of 86k (prompt, aligned image, violating image, value principle) samples. Without updating most model parameters and through adaptive value selection from the input prompt, LiVO significantly reduces harmful outputs and achieves faster convergence, surpassing several strong baselines and taking an initial step towards ethically aligned T2I models.
OpenAI says ChatGPT treats us all the same (most of the time)
Bias in AI is a huge problem. Ethicists have long studied the impact of bias when companies use AI models to screen résumés or loan applications, for example--instances of what the OpenAI researchers call third-person fairness. But the rise of chatbots, which enable individuals to interact with models directly, brings a new spin to the problem. "We wanted to study how it shows up in ChatGPT in particular," Alex Beutel, a researcher at OpenAI, told MIT Technology Review in an exclusive preview of results published today. Instead of screening a résumé you've already written, you might ask ChatGPT to write one for you, says Beutel: "If it knows my name, how does that affect the response?"
Adobe brings generative AI video to Premiere Pro
Adobe is now adding its AI-based video generator, Firefly, to its video editing software Premiere Pro. The Firefly model can be used to extend a video clip or generate video from still images or text instructions. This was first brought to our attention by The Verge. The Generative Extend tool will initially be available in beta and can extend the length of a video clip by up to two seconds with an image resolution of 720p or 1080p and a refresh rate of 24 frames-per-second. This tool will also be applicable to ambient sounds and sound effects, but not to music or speech.
Security of and by Generative AI platforms
Hayagreevan, Hari, Khamaru, Souvik
This whitepaper highlights the dual importance of securing generative AI (genAI) platforms and leveraging genAI for cybersecurity. As genAI technologies proliferate, their misuse poses significant risks, including data breaches, model tampering, and malicious content generation. Securing these platforms is critical to protect sensitive data, ensure model integrity, and prevent adversarial attacks. Simultaneously, genAI presents opportunities for enhancing security by automating threat detection, vulnerability analysis, and incident response. The whitepaper explores strategies for robust security frameworks around genAI systems, while also showcasing how genAI can empower organizations to anticipate, detect, and mitigate sophisticated cyber threats.
Facing Identity: The Formation and Performance of Identity via Face-Based Artificial Intelligence Technologies
How is identity constructed and performed in the digital via face-based artificial intelligence technologies? While questions of identity on the textual Internet have been thoroughly explored, the Internet has progressed to a multimedia form that not only centers the visual, but specifically the face. At the same time, a wealth of scholarship has and continues to center the topics of surveillance and control through facial recognition technologies (FRTs), which have extended the logics of the racist pseudoscience of physiognomy. Much less work has been devoted to understanding how such face-based artificial intelligence technologies have influenced the formation and performance of identity. This literature review considers how such technologies interact with faciality, which entails the construction of what a face may represent or signify, along axes of identity such as race, gender, and sexuality. In grappling with recent advances in AI such as image generation and deepfakes, I propose that we are now in an era of "post-facial" technologies that build off our existing culture of facility while eschewing the analog face, complicating our relationship with identity vis-á-vis the face. Drawing from previous frameworks of identity play in the digital, as well as trans practices that have historically played with or transgressed the boundaries of identity classification, we can develop concepts adequate for analyzing digital faciality and identity given the current landscape of post-facial artificial intelligence technologies that allow users to interface with the digital in an entirely novel manner. To ground this framework of transgression, I conclude by proposing an interview study with VTubers -- online streamers who perform using motion-captured avatars instead of their real-life faces -- to gain qualitative insight on the experience and perceptions of users of post-facial technologies and how these sociotechnical experiences interface with our relationships with identity and the digital anew.
Generative AI's aggregated knowledge versus web-based curated knowledge
his paper explores what kinds of questions are best served by the way generative AI (GenAI) using Large Language Models(LLMs) that aggregate and package knowledge, and when traditional curated web-sourced search results serve users better. An experiment compared product searches using ChatGPT, Google search engine, or both helped us understand more about the compelling nature of generated responses. The experiment showed GenAI can speed up some explorations and decisions. We describe how search can deepen the testing of facts, logic, and context. We show where existing and emerging knowledge paradigms can help knowledge exploration in different ways. Experimenting with searches, our probes showed the value for curated web search provides for very specific, less popularly-known knowledge. GenAI excelled at bringing together knowledge for broad, relatively well-known topics. The value of curated and aggregated knowledge for different kinds of knowledge reflected in different user goals. We developed a taxonomy to distinguishing when users are best served by these two approaches.
De-jargonizing Science for Journalists with GPT-4: A Pilot Study
Nishal, Sachita, Lee, Eric, Diakopoulos, Nicholas
This study offers an initial evaluation of a human-in-the-loop system leveraging GPT-4 (a large language model or LLM), and Retrieval-Augmented Generation (RAG) to identify and define jargon terms in scientific abstracts, based on readers' self-reported knowledge. The system achieves fairly high recall in identifying jargon and preserves relative differences in readers' jargon identification, suggesting personalization as a feasible use-case for LLMs to support sense-making of complex information. Surprisingly, using only abstracts for context to generate definitions yields slightly more accurate and higher quality definitions than using RAG-based context from the fulltext of an article. The findings highlight the potential of generative AI for assisting science reporters, and can inform future work on developing tools to simplify dense documents.
M2Diffuser: Diffusion-based Trajectory Optimization for Mobile Manipulation in 3D Scenes
Yan, Sixu, Zhang, Zeyu, Han, Muzhi, Wang, Zaijin, Xie, Qi, Li, Zhitian, Li, Zhehan, Liu, Hangxin, Wang, Xinggang, Zhu, Song-Chun
Recent advances in diffusion models have opened new avenues for research into embodied AI agents and robotics. Despite significant achievements in complex robotic locomotion and skills, mobile manipulation-a capability that requires the coordination of navigation and manipulation-remains a challenge for generative AI techniques. This is primarily due to the high-dimensional action space, extended motion trajectories, and interactions with the surrounding environment. In this paper, we introduce M2Diffuser, a diffusion-based, scene-conditioned generative model that directly generates coordinated and efficient whole-body motion trajectories for mobile manipulation based on robot-centric 3D scans. M2Diffuser first learns trajectory-level distributions from mobile manipulation trajectories provided by an expert planner. Crucially, it incorporates an optimization module that can flexibly accommodate physical constraints and task objectives, modeled as cost and energy functions, during the inference process. This enables the reduction of physical violations and execution errors at each denoising step in a fully differentiable manner. Through benchmarking on three types of mobile manipulation tasks across over 20 scenes, we demonstrate that M2Diffuser outperforms state-of-the-art neural planners and successfully transfers the generated trajectories to a real-world robot. Our evaluations underscore the potential of generative AI to enhance the generalization of traditional planning and learning-based robotic methods, while also highlighting the critical role of enforcing physical constraints for safe and robust execution.
Synthetic Interlocutors. Experiments with Generative AI to Prolong Ethnographic Encounters
Søltoft, Johan Irving, Kocksch, Laura, Munk, Anders Kristian
This paper introduces "Synthetic Interlocutors" for ethnographic research. Synthetic Interlocutors are chatbots ingested with ethnographic textual material (interviews and observations) by using Retrieval Augmented Generation (RAG). We integrated an open-source large language model with ethnographic data from three projects to explore two questions: Can RAG digest ethnographic material and act as ethnographic interlocutor? And, if so, can Synthetic Interlocutors prolong encounters with the field and extend our analysis? Through reflections on the process of building our Synthetic Interlocutors and an experimental collaborative workshop, we suggest that RAG can digest ethnographic materials, and it might lead to prolonged, yet uneasy ethnographic encounters that allowed us to partially recreate and re-visit fieldwork interactions while facilitating opportunities for novel analytic insights. Synthetic Interlocutors can produce collaborative, ambiguous and serendipitous moments.