Goto

Collaborating Authors

 Generative AI


Enhancing Privacy in ControlNet and Stable Diffusion via Split Learning

arXiv.org Artificial Intelligence

With the emerging trend of large generative models, ControlNet is introduced to enable users to fine-tune pre-trained models with their own data for various use cases. A natural question arises: how can we train ControlNet models while ensuring users' data privacy across distributed devices? Exploring different distributed training schemes, we find conventional federated learning and split learning unsuitable. Instead, we propose a new distributed learning structure that eliminates the need for the server to send gradients back. Through a comprehensive evaluation of existing threats, we discover that in the context of training ControlNet with split learning, most existing attacks are ineffective, except for two mentioned in previous literature. To counter these threats, we leverage the properties of diffusion models and design a new timestep sampling policy during forward processes. We further propose a privacy-preserving activation function and a method to prevent private text prompts from leaving clients, tailored for image generation with diffusion models. Our experimental results demonstrate that our algorithms and systems greatly enhance the efficiency of distributed training for ControlNet while ensuring users' data privacy without compromising image generation quality.


The Impact of Large Language Models on Open-source Innovation: Evidence from GitHub Copilot

arXiv.org Artificial Intelligence

Generative AI (GenAI) has been shown to enhance individual productivity in a guided setting. While it is also likely to transform processes in a collaborative work setting, it is unclear what trajectory this transformation will follow. Collaborative environment is characterized by a blend of origination tasks that involve building something from scratch and iteration tasks that involve refining on others' work. Whether GenAI affects these two aspects of collaborative work and to what extent is an open empirical question. We study this question within the open-source development landscape, a prime example of collaborative innovation, where contributions are voluntary and unguided. Specifically, we focus on the launch of GitHub Copilot in October 2021 and leverage a natural experiment in which GitHub Copilot (a programming-focused LLM) selectively rolled out support for Python, but not for R. We observe a significant jump in overall contributions, suggesting that GenAI effectively augments collaborative innovation in an unguided setting. Interestingly, Copilot's launch increased maintenance-related contributions, which are mostly iterative tasks involving building on others' work, significantly more than code-development contributions, which are mostly origination tasks involving standalone contributions. This disparity was exacerbated in active projects with extensive coding activity, raising concerns that, as GenAI models improve to accommodate richer context, the gap between origination and iterative solutions may widen. We discuss practical and policy implications to incentivize high-value innovative solutions.


UGAD: Universal Generative AI Detector utilizing Frequency Fingerprints

arXiv.org Artificial Intelligence

In the wake of a fabricated explosion image at the Pentagon, an ability to discern real images from fake counterparts has never been more critical. Our study introduces a novel multi-modal approach to detect AI-generated images amidst the proliferation of new generation methods such as Diffusion models. Our method, UGAD, encompasses three key detection steps: First, we transform the RGB images into YCbCr channels and apply an Integral Radial Operation to emphasize salient radial features. Secondly, the Spatial Fourier Extraction operation is used for a spatial shift, utilizing a pre-trained deep learning network for optimal feature extraction. Finally, the deep neural network classification stage processes the data through dense layers using softmax for classification. Our approach significantly enhances the accuracy of differentiating between real and AI-generated images, as evidenced by a 12.64% increase in accuracy and 28.43% increase in AUC compared to existing state-of-the-art methods.


Iterative CT Reconstruction via Latent Variable Optimization of Shallow Diffusion Models

arXiv.org Artificial Intelligence

Image-generative artificial intelligence (AI) has garnered significant attention in recent years. In particular, the diffusion model, a core component of generative AI, produces high-quality images with rich diversity. In this study, we proposed a novel computed tomography (CT) reconstruction method by combining the denoising diffusion probabilistic model with iterative CT reconstruction. In sharp contrast to previous studies, we optimized the fidelity loss of CT reconstruction with respect to the latent variable of the diffusion model, instead of the image and model parameters. To suppress the changes in anatomical structures produced by the diffusion model, we shallowed the diffusion and reverse processes and fixed a set of added noises in the reverse process to make it deterministic during the inference. We demonstrated the effectiveness of the proposed method through the sparse-projection CT reconstruction of 1/10 projection data. Despite the simplicity of the implementation, the proposed method has the potential to reconstruct high-quality images while preserving the patient's anatomical structures and was found to outperform existing methods, including iterative reconstruction, iterative reconstruction with total variation, and the diffusion model alone in terms of quantitative indices such as the structural similarity index and peak signal-to-noise ratio. We also explored further sparse-projection CT reconstruction using 1/20 projection data with the same trained diffusion model. As the number of iterations increased, the image quality improved comparable to that of 1/10 sparse-projection CT reconstruction. In principle, this method can be widely applied not only to CT but also to other imaging modalities.


Trading Devil Final: Backdoor attack via Stock market and Bayesian Optimization

arXiv.org Artificial Intelligence

Since the advent of generative artificial intelligence, every company and researcher has been rushing to develop their own generative models, whether commercial or not. Given the large number of users of these powerful new tools, there is currently no intrinsically verifiable way to explain from the ground up what happens when LLMs (large language models) learn. For example, those based on automatic speech recognition systems, which have to rely on huge and astronomical amounts of data collected from all over the web to produce fast and efficient results, In this article, we develop a backdoor attack called MarketBackFinal 2.0, based on acoustic data poisoning, MarketBackFinal 2.0 is mainly based on modern stock market models. In order to show the possible vulnerabilities of speech-based transformers that may rely on LLMs.


Trustworthy, Responsible, and Safe AI: A Comprehensive Architectural Framework for AI Safety with Challenges and Mitigations

arXiv.org Artificial Intelligence

AI Safety is an emerging area of critical importance to the safe adoption and deployment of AI systems. With the rapid proliferation of AI and especially with the recent advancement of Generative AI (or GAI), the technology ecosystem behind the design, development, adoption, and deployment of AI systems has drastically changed, broadening the scope of AI Safety to address impacts on public safety and national security. In this paper, we propose a novel architectural framework for understanding and analyzing AI Safety; defining its characteristics from three perspectives: Trustworthy AI, Responsible AI, and Safe AI. We provide an extensive review of current research and advancements in AI safety from these perspectives, highlighting their key challenges and mitigation approaches. Through examples from state-of-the-art technologies, particularly Large Language Models (LLMs), we present innovative mechanism, methodologies, and techniques for designing and testing AI safety. Our goal is to promote advancement in AI safety research, and ultimately enhance people's trust in digital transformation.


Risks When Sharing LoRA Fine-Tuned Diffusion Model Weights

arXiv.org Artificial Intelligence

With the emerging trend in generative models and convenient public access to diffusion models pre-trained on large datasets, users can fine-tune these models to generate images of personal faces or items in new contexts described by natural language. Parameter efficient fine-tuning (PEFT) such as Low Rank Adaptation (LoRA) has become the most common way to save memory and computation usage on the user end during fine-tuning. However, a natural question is whether the private images used for fine-tuning will be leaked to adversaries when sharing model weights. In this paper, we study the issue of privacy leakage of a fine-tuned diffusion model in a practical setting, where adversaries only have access to model weights, rather than prompts or images used for fine-tuning. We design and build a variational network autoencoder that takes model weights as input and outputs the reconstruction of private images. To improve the efficiency of training such an autoencoder, we propose a training paradigm with the help of timestep embedding. The results give a surprising answer to this research question: an adversary can generate images containing the same identities as the private images. Furthermore, we demonstrate that no existing defense method, including differential privacy-based methods, can preserve the privacy of private data used for fine-tuning a diffusion model without compromising the utility of a fine-tuned model.


This New Tech Puts AI In Touch with Its Emotions--and Yours

WIRED

A new "empathic voice interface" launched today by Hume AI, a New Yorkโ€“based startup, makes it possible to add a range of emotionally expressive voices, plus an emotionally attuned ear, to large language models from Anthropic, Google, Meta, Mistral, and OpenAI--portending an era when AI helpers may more routinely get all gushy on us. "We specialize in building empathic personalities that speak in ways people would speak, rather than stereotypes of AI assistants," says Hume AI cofounder Alan Cowen, a psychologist who has coauthored a number of research papers on AI and emotion, and who previously worked on emotional technologies at Google and Facebook. WIRED tested Hume's latest voice technology, called EVI 2 and found its output to be similar to that developed by OpenAI for ChatGPT. Later, a real movie star, Scarlett Johansson, claimed OpenAI had ripped off her voice.) Like ChatGPT, Hume is far more emotionally expressive than most conventional voice interfaces. If you tell it that your pet has died, for example, it will adopt a suitable somber and sympathetic tone.


Meta scraped every Australian user's account to train its AI

Engadget

In a government inquiry about AI adoption in Australia, Meta's global privacy director Melinda Claybaugh was asked whether her company has been collecting Australians' data to train its generative AI technology. According to ABC News, Claybaugh initially denied the claim, but upon being pressed, she ultimately admitted that Meta scrapes all the photos and texts in all Facebook and Instagram posts from as far back as 2007, unless the user had set their posts to private. Further, she admitted that the company isn't offering Australians an opt-out option like it does to users in the European Union. Claybaugh said that Meta doesn't scrape the accounts of users under 18 years old, but she admitted that the company still collects their photos and other information if they're posted on their parents' or guardians' accounts. She couldn't answer, however, if the company collects data from previous years once a user turns 18. Upon being asked why Meta doesn't offer Australians the option not to consent to data collection, Claybaugh said that it exists in the EU "in response to a very specific legal frame," which most likely pertains to the bloc's General Data Protection Regulation (GDPR).


US senators urge regulators to probe potential AI antitrust violations

Engadget

The US government has noticed the potentially negative effects of generative AI on areas like journalism and content creation. Senator Amy Klobuchar, along with seven Democrat colleagues, urged the Federal Trade Commission (FTC) and Justice Department to probe generative AI products like ChatGPT for potential antitrust violations, they wrote in a press release. "Recently, multiple dominant online platforms have introduced new generative AI features that answer user queries by summarizing, or, in some cases, merely regurgitating online content from other sources or platforms," the letter states. "The introduction of these new generative AI features further threatens the ability of journalists and other content creators to earn compensation for their vital work." The lawmakers went on to note that traditional search results lead users to publishers' websites while AI-generated summaries keep the users on the search platform "where that platform alone can profit from the user's attention through advertising and data collection."