Plotting

 Wang, Li


FaceSwapGuard: Safeguarding Facial Privacy from DeepFake Threats through Identity Obfuscation

arXiv.org Artificial Intelligence

DeepFakes pose a significant threat to our society. One representative DeepFake application is face-swapping, which replaces the identity in a facial image with that of a victim. Although existing methods partially mitigate these risks by degrading the quality of swapped images, they often fail to disrupt the identity transformation effectively. To fill this gap, we propose FaceSwapGuard (FSG), a novel black-box defense mechanism against deepfake face-swapping threats. Specifically, FSG introduces imperceptible perturbations to a user's facial image, disrupting the features extracted by identity encoders. When shared online, these perturbed images mislead face-swapping techniques, causing them to generate facial images with identities significantly different from the original user. Extensive experiments demonstrate the effectiveness of FSG against multiple face-swapping techniques, reducing the face match rate from 90\% (without defense) to below 10\%. Both qualitative and quantitative studies further confirm its ability to confuse human perception, highlighting its practical utility. Additionally, we investigate key factors that may influence FSG and evaluate its robustness against various adaptive adversaries.


PDA: Generalizable Detection of AI-Generated Images via Post-hoc Distribution Alignment

arXiv.org Artificial Intelligence

The rapid advancement of generative models has led to the proliferation of highly realistic AI-generated images, posing significant challenges for detection methods to generalize across diverse and evolving generative techniques. Existing approaches often fail to adapt to unknown models without costly retraining, limiting their practicability. To fill this gap, we propose Post-hoc Distribution Alignment (PDA), a novel approach for the generalizable detection for AI-generated images. The key idea is to use the known generative model to regenerate undifferentiated test images. This process aligns the distributions of the re-generated real images with the known fake images, enabling effective distinction from unknown fake images. PDA employs a two-step detection framework: 1) evaluating whether a test image aligns with the known fake distribution based on deep k-nearest neighbor (KNN) distance, and 2) re-generating test images using known generative models to create pseudo-fake images for further classification. This alignment strategy allows PDA to effectively detect fake images without relying on unseen data or requiring retraining. Extensive experiments demonstrate the superiority of PDA, achieving 96.73\% average accuracy across six state-of-the-art generative models, including GANs, diffusion models, and text-to-image models, and improving by 16.07\% over the best baseline. Through t-SNE visualizations and KNN distance analysis, we provide insights into PDA's effectiveness in separating real and fake images. Our work provides a flexible and effective solution for real-world fake image detection, advancing the generalization ability of detection systems.


Overview of the Amphion Toolkit (v0.2)

arXiv.org Artificial Intelligence

Amphion is an open-source toolkit for Audio, Music, and Speech Generation, designed to lower the entry barrier for junior researchers and engineers in these fields. It provides a versatile framework that supports a variety of generation tasks and models. In this report, we introduce Amphion v0.2, the second major release developed in 2024. This release features a 100K-hour open-source multilingual dataset, a robust data preparation pipeline, and novel models for tasks such as text-to-speech, audio coding, and voice conversion. Furthermore, the report includes multiple tutorials that guide users through the functionalities and usage of the newly released models.


DistrEE: Distributed Early Exit of Deep Neural Network Inference on Edge Devices

arXiv.org Artificial Intelligence

DistrEE: Distributed Early Exit of Deep Neural Network Inference on Edge Devices Xian Peng, Xin Wu, Lianming Xu, Li Wang and Aiguo Fei School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing, China School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing, China Abstract --Distributed DNN inference is becoming increasingly important as the demand for intelligent services at the network edge grows. By leveraging the power of distributed computing, edge devices can perform complicated and resource-hungry inference tasks previously only possible on powerful servers, enabling new applications in areas such as autonomous vehicles, industrial automation, and smart homes. However, it is challenging to achieve accurate and efficient distributed edge inference due to the fluctuating nature of the actual resources of the devices and the processing difficulty of the input data. In this work, we propose DistrEE, a distributed DNN inference framework that can exit model inference early to meet specific quality of service requirements. In particular, the framework firstly integrates model early exit and distributed inference for multi-node collaborative inferencing scenarios. Furthermore, it designs an early exit policy to control when the model inference terminates.


Exploring the latent space of diffusion models directly through singular value decomposition

arXiv.org Artificial Intelligence

Despite the groundbreaking success of diffusion models in generating high-fidelity images, their latent space remains relatively under-explored, even though it holds significant promise for enabling versatile and interpretable image editing capabilities. The complicated denoising trajectory and high dimensionality of the latent space make it extremely challenging to interpret. Existing methods mainly explore the feature space of U-Net in Diffusion Models (DMs) instead of the latent space itself. In contrast, we directly investigate the latent space via Singular Value Decomposition (SVD) and discover three useful properties that can be used to control generation results without the requirements of data collection and maintain identity fidelity generated images. Based on these properties, we propose a novel image editing framework that is capable of learning arbitrary attributes from one pair of latent codes destined by text prompts in Stable Diffusion Models. To validate our approach, extensive experiments are conducted to demonstrate its effectiveness and flexibility in image editing. We will release our codes soon to foster further research and applications in this area.


Language model driven: a PROTAC generation pipeline with dual constraints of structure and property

arXiv.org Artificial Intelligence

The imperfect modeling of ternary complexes has limited the application of computer-aided drug discovery tools in PROTAC research and development. In this study, an AI-assisted approach for PROTAC molecule design pipeline named LM-PROTAC was developed, which stands for language model driven Proteolysis Targeting Chimera, by embedding a transformer-based generative model with dual constraints on structure and properties, referred to as the DCT. This study utilized the fragmentation representation of molecules and developed a language model driven pipeline. Firstly, a language model driven affinity model for protein compounds to screen molecular fragments with high affinity for the target protein. Secondly, structural and physicochemical properties of these fragments were constrained during the generation process to meet specific scenario requirements. Finally, a two-round screening of the preliminary generated molecules using a multidimensional property prediction model to generate a batch of PROTAC molecules capable of degrading disease-relevant target proteins for validation in vitro experiments, thus achieving a complete solution for AI-assisted PROTAC drug generation. Taking the tumor key target Wnt3a as an example, the LM-PROTAC pipeline successfully generated PROTAC molecules capable of inhibiting Wnt3a. The results show that DCT can efficiently generate PROTAC that targets and hydrolyses Wnt3a.


Adversarial Purification by Consistency-aware Latent Space Optimization on Data Manifolds

arXiv.org Artificial Intelligence

Deep neural networks (DNNs) are vulnerable to adversarial samples crafted by adding imperceptible perturbations to clean data, potentially leading to incorrect and dangerous predictions. Adversarial purification has been an effective means to improve DNNs robustness by removing these perturbations before feeding the data into the model. However, it faces significant challenges in preserving key structural and semantic information of data, as the imperceptible nature of adversarial perturbations makes it hard to avoid over-correcting, which can destroy important information and degrade model performance. In this paper, we break away from traditional adversarial purification methods by focusing on the clean data manifold. To this end, we reveal that samples generated by a well-trained generative model are close to clean ones but far from adversarial ones. Leveraging this insight, we propose Consistency Model-based Adversarial Purification (CMAP), which optimizes vectors within the latent space of a pre-trained consistency model to generate samples for restoring clean data. Specifically, 1) we propose a \textit{Perceptual consistency restoration} mechanism by minimizing the discrepancy between generated samples and input samples in both pixel and perceptual spaces. 2) To maintain the optimized latent vectors within the valid data manifold, we introduce a \textit{Latent distribution consistency constraint} strategy to align generated samples with the clean data distribution. 3) We also apply a \textit{Latent vector consistency prediction} scheme via an ensemble approach to enhance prediction reliability. CMAP fundamentally addresses adversarial perturbations at their source, providing a robust purification. Extensive experiments on CIFAR-10 and ImageNet-100 show that our CMAP significantly enhances robustness against strong adversarial attacks while preserving high natural accuracy.


Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities

arXiv.org Artificial Intelligence

One-shot voice conversion (VC) aims to alter the timbre of speech from a source speaker to match that of a target speaker using just a single reference speech from the target, while preserving the semantic content of the original source speech. Despite advancements in one-shot VC, its effectiveness decreases in real-world scenarios where reference speeches, often sourced from the internet, contain various disturbances like background noise. To address this issue, we introduce Noro, a Noise Robust One-shot VC system. Noro features innovative components tailored for VC using noisy reference speeches, including a dual-branch reference encoding module and a noise-agnostic contrastive speaker loss. Experimental results demonstrate that Noro outperforms our baseline system in both clean and noisy scenarios, highlighting its efficacy for real-world applications. Additionally, we investigate the hidden speaker representation capabilities of our baseline system by repurposing its reference encoder as a speaker encoder. The results shows that it is competitive with several advanced self-supervised learning models for speaker representation under the SUPERB settings, highlighting the potential for advancing speaker representation learning through one-shot VC task.


A Comparative Study of Deep Reinforcement Learning for Crop Production Management

arXiv.org Artificial Intelligence

Crop production management is essential for optimizing yield and minimizing a field's environmental impact to crop fields, yet it remains challenging due to the complex and stochastic processes involved. Recently, researchers have turned to machine learning to address these complexities. Specifically, reinforcement learning (RL), a cutting-edge approach designed to learn optimal decision-making strategies through trial and error in dynamic environments, has emerged as a promising tool for developing adaptive crop management policies. RL models aim to optimize long-term rewards by continuously interacting with the environment, making them well-suited for tackling the uncertainties and variability inherent in crop management. Studies have shown that RL can generate crop management policies that compete with, and even outperform, expert-designed policies within simulation-based crop models. In the gym-DSSAT crop model environment, one of the most widely used simulators for crop management, proximal policy optimization (PPO) and deep Q-networks (DQN) have shown promising results. However, these methods have not yet been systematically evaluated under identical conditions. In this study, we evaluated PPO and DQN against static baseline policies across three different RL tasks, fertilization, irrigation, and mixed management, provided by the gym-DSSAT environment. To ensure a fair comparison, we used consistent default parameters, identical reward functions, and the same environment settings. Our results indicate that PPO outperforms DQN in fertilization and irrigation tasks, while DQN excels in the mixed management task. This comparative analysis provides critical insights into the strengths and limitations of each approach, advancing the development of more effective RL-based crop management strategies.


A Field Calibration Approach for Triaxial MEMS Gyroscopes Based on Gravity and Rotation Consistency

arXiv.org Artificial Intelligence

This paper developed an efficient method for calibrating triaxial MEMS gyroscopes, which can be effectively utilized in the field environment. The core strategy is to utilize the criterion that the dot product of the measured gravity and the rotation speed in a fixed frame remains constant. To eliminate the impact of external acceleration, the calibration process involves separate procedures for measuring local gravity and rotation speed. Moreover, unlike existing approaches for auto calibration of triaxial sensors that often result in nonlinear optimization problems, the proposed method simplifies the estimation of the gyroscope scale factor by employing a linear least squares algorithm. Extensive numerical simulations have been conducted to analyze the proposed method's performance in calibrating the six-parameter triaxial gyroscope model, taking into consideration measurements corrupted by simulated noise. Experimental validation was also carried out using two commercially available MEMS inertial measurement units (LSM9DS1) and a servo motor. The experimental results effectively demonstrate the efficacy of the proposed calibration approach.