Generative AI
Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook
Croitoru, Florinel-Alin, Hiji, Andrei-Iulian, Hondru, Vlad, Ristea, Nicolae Catalin, Irofti, Paul, Popescu, Marius, Rusu, Cristian, Ionescu, Radu Tudor, Khan, Fahad Shahbaz, Shah, Mubarak
With the recent advancements in generative modeling, the realism of deepfake content has been increasing at a steady pace, even reaching the point where people often fail to detect manipulated media content online, thus being deceived into various kinds of scams. In this paper, we survey deepfake generation and detection techniques, including the most recent developments in the field, such as diffusion models and Neural Radiance Fields. Our literature review covers all deepfake media types, comprising image, video, audio and multimodal (audio-visual) content. We identify various kinds of deepfakes, according to the procedure used to alter or generate the fake content. We further construct a taxonomy of deepfake generation and detection methods, illustrating the important groups of methods and the domains where these methods are applied. Next, we gather datasets used for deepfake detection and provide updated rankings of the best performing deepfake detectors on the most popular datasets. In addition, we develop a novel multimodal benchmark to evaluate deepfake detectors on out-of-distribution content. The results indicate that state-of-the-art detectors fail to generalize to deepfake content generated by unseen deepfake generators. Finally, we propose future directions to obtain robust and powerful deepfake detectors. Our project page and new benchmark are available at https://github.com/CroitoruAlin/biodeep.
PDDLFuse: A Tool for Generating Diverse Planning Domains
Khandelwal, Vedant, Sheth, Amit, Agostinelli, Forest
Various real-world challenges require planning algorithms that can adapt to a broad range of domains. Traditionally, the creation of planning domains has relied heavily on human implementation, which limits the scale and diversity of available domains. While recent advancements have leveraged generative AI technologies such as large language models (LLMs) for domain creation, these efforts have predominantly focused on translating existing domains from natural language descriptions rather than generating novel ones. In contrast, the concept of domain randomization, which has been highly effective in reinforcement learning, enhances performance and generalizability by training on a diverse array of randomized new domains. Inspired by this success, our tool, PDDLFuse, aims to bridge this gap in Planning Domain Definition Language (PDDL). PDDLFuse is designed to generate new, diverse planning domains that can be used to validate new planners or test foundational planning models. We have developed methods to adjust the domain generators parameters to modulate the difficulty of the domains it generates. This adaptability is crucial as existing domain-independent planners often struggle with more complex problems. Initial tests indicate that PDDLFuse efficiently creates intricate and varied domains, representing a significant advancement over traditional domain generation methods and making a contribution towards planning research.
'God of management' comes back to life as an AI model
Panasonic Holdings has created an artificial intelligence clone of its late founder Konosuke Matsushita based on his writings, speeches, and over 3,000 voice recordings, the company announced Wednesday. Known as Japan's "god of management," the Panasonic icon is one of the most respected by the Japanese business community, and comes back to life in digital form to impart wisdom directly to those he never met in person. "As the number of people who received training directly from Matsushita has been on the decline, we decided to use generative AI technology to pass down our group's founding vision to the next generation," the company said in a statement. Co-developed with the University of Tokyo-affiliated Matsuo Institute, the model can reproduce how a person thinks or talks. The company aims to further develop the digital clone to help make business decisions in the future.
SoftBank seeks to buy 1.5 billion OpenAI shares from employees
SoftBank Group is aiming to increase its stake in OpenAI by acquiring up to 1.5 billion in shares from the startup's employees, according to people familiar with the matter. The company will make a tender offer for the stock, allowing OpenAI employees to cash in shares if they choose. SoftBank contributed 500 million to OpenAI's 6.6 billion fundraising round in October, but had pushed for a larger allocation at the time, said one of the people, asking not to be named because the negotiations aren't public. SoftBank founder Masayoshi Son has vowed to step up investments in artificial intelligence as his Tokyo-based company regains its financial footing after years of missteps. OpenAI, which jumped out to early leadership in the field with its ChatGPT product, was valued at 157 billion in the last fundraising.
Adult learners recall and recognition performance and affective feedback when learning from an AI-generated synthetic video
Li, Zoe Ruo-Yu, Barry, Caswell, Cukurova, Mutlu
The widespread use of generative AI has led to multiple applications of AI-generated text and media to potentially enhance learning outcomes. However, there are a limited number of well-designed experimental studies investigating the impact of learning gains and affective feedback from AI-generated media compared to traditional media (e.g., text from documents and human recordings of video). The current study recruited 500 participants to investigate adult learners recall and recognition performances as well as their affective feedback on the AI-generated synthetic video, using a mixed-methods approach with a pre-and post-test design. Specifically, four learning conditions, AI-generated framing of human instructor-generated text, AI-generated synthetic videos with human instructor-generated text, human instructor-generated videos, and human instructor-generated text frame (baseline), were considered. The results indicated no statistically significant difference amongst conditions on recall and recognition performance. In addition, the participants affective feedback was not statistically significantly different between the two video conditions. However, adult learners preferred to learn from the video formats rather than text materials.
Any-Resolution AI-Generated Image Detection by Spectral Learning
Karageorgiou, Dimitrios, Papadopoulos, Symeon, Kompatsiaris, Ioannis, Gavves, Efstratios
Recent works have established that AI models introduce spectral artifacts into generated images and propose approaches for learning to capture them using labeled data. However, the significant differences in such artifacts among different generative models hinder these approaches from generalizing to generators not seen during training. In this work, we build upon the key idea that the spectral distribution of real images constitutes both an invariant and highly discriminative pattern for AI-generated image detection. To model this under a self-supervised setup, we employ masked spectral learning using the pretext task of frequency reconstruction. Since generated images constitute out-of-distribution samples for this model, we propose spectral reconstruction similarity to capture this divergence. Moreover, we introduce spectral context attention, which enables our approach to efficiently capture subtle spectral inconsistencies in images of any resolution. Our spectral AI-generated image detection approach (SPAI) achieves a 5.5% absolute improvement in AUC over the previous state-of-the-art across 13 recent generative approaches, while exhibiting robustness against common online perturbations.
Extracting Training Data from Unconditional Diffusion Models
Chen, Yunhao, Wang, Shujie, Zou, Difan, Ma, Xingjun
As diffusion probabilistic models (DPMs) are being employed as mainstream models for Generative Artificial Intelligence (GenAI), the study of their memorization has attracted growing attention. Existing works in this field aim to establish an understanding of whether or to what extent DPMs learn via memorization. Such an understanding is crucial for identifying potential risks of data leakage and copyright infringement in diffusion models and, more importantly, for trustworthy application of GenAI. Existing works revealed that conditional DPMs are more prone to memorize training data than unconditional DPMs. And most data extraction methods developed so far target conditional DPMs. Although unconditional DPMs are less prone to data extraction, further investigation into these attacks remains essential since they serve as the foundation for conditional models like Stable Diffusion, and exploring these attacks will enhance our understanding of memorization in DPMs. In this work, we propose a novel data extraction method named \textbf{Surrogate condItional Data Extraction (SIDE)} that leverages a time-dependent classifier trained on generated data as surrogate conditions to extract training data from unconditional DPMs. Empirical results demonstrate that it can extract training data in challenging scenarios where previous methods fail, and it is, on average, over 50\% more effective across different scales of the CelebA dataset. Furthermore, we provide a theoretical understanding of memorization in both conditional and unconditional DPMs and why SIDE is effective.
OpenAI suspends access to Sora video generation tool after artists protest
Earlier this year OpenAI unveiled Sora, a text-to-video AI model, showing off detailed scenes and complex camera motion from relatively simple prompts. It's been radio silence since then, but the company recently granted artists free early access to the tool for testing. However, a group off around 20 of those just leaked access to Sora in protest, saying they were acting as "PR puppets," prompting OpenAI to suspend access, The Washington Post reported. "We received access to Sora with the promise to be early testers, red teamers and creative partners. However, we believe instead we are being lured into'art washing' to tell the world that Sora is a useful tool for artists," the group wrote on the AI art repository site, Hugging Face.
What Google Off-loading Chrome Would Mean for Users
Using "the Internet" sometimes seems disconcertingly synonymous with using Google. Google Search, the most popular search engine on the planet, indexes the open Internet, driving traffic to Web sites, and Google Ads provides the revenue that publishers survive on. Gmail is how some two billion people receive their e-mail; many Gmail in-boxes have been accumulating messages for a decade or more. Last, but certainly not least, the company's browser, Google Chrome, is what a staggering three billion people use to navigate the Internet. According to some estimates, Google holds nearly ninety per cent market share in search engines in the U.S. Chrome, in turn, provides the audience data that Google's ads leverage to target users, and links the company's other services together.
Generative Visual Communication in the Era of Vision-Language Models
Visual communication, dating back to prehistoric cave paintings, is the use of visual elements to convey ideas and information. In today's visually saturated world, effective design demands an understanding of graphic design principles, visual storytelling, human psychology, and the ability to distill complex information into clear visuals. This dissertation explores how recent advancements in vision-language models (VLMs) can be leveraged to automate the creation of effective visual communication designs. Although generative models have made great progress in generating images from text, they still struggle to simplify complex ideas into clear, abstract visuals and are constrained by pixel-based outputs, which lack flexibility for many design tasks. To address these challenges, we constrain the models' operational space and introduce task-specific regularizations. We explore various aspects of visual communication, namely, sketches and visual abstraction, typography, animation, and visual inspiration.