Wang, Can
Recent Advances on Generalizable Diffusion-generated Image Detection
Xu, Qijie, Chen, Defang, Chen, Jiawei, Lyu, Siwei, Wang, Can
The rise of diffusion models has significantly improved the fidelity and diversity of generated images. With numerous benefits, these advancements also introduce new risks. Diffusion models can be exploited to create high-quality Deepfake images, which poses challenges for image authenticity verification. In recent years, research on generalizable diffusion-generated image detection has grown rapidly. However, a comprehensive review of this topic is still lacking. To bridge this gap, we present a systematic survey of recent advances and classify them into two main categories: (1) data-driven detection and (2) feature-driven detection. Existing detection methods are further classified into six fine-grained categories based on their underlying principles. Finally, we identify several open challenges and envision some future directions, with the hope of inspiring more research work on this important topic. Reviewed works in this survey can be found at https://github.com/zju-pi/
Uncertainty-Aware Graph Structure Learning
Han, Shen, Zhou, Zhiyao, Chen, Jiawei, Hao, Zhezheng, Zhou, Sheng, Wang, Gang, Feng, Yan, Chen, Chun, Wang, Can
Graph Neural Networks (GNNs) have become a prominent approach for learning from graph-structured data. However, their effectiveness can be significantly compromised when the graph structure is suboptimal. To address this issue, Graph Structure Learning (GSL) has emerged as a promising technique that refines node connections adaptively. Nevertheless, we identify two key limitations in existing GSL methods: 1) Most methods primarily focus on node similarity to construct relationships, while overlooking the quality of node information. Blindly connecting low-quality nodes and aggregating their ambiguous information can degrade the performance of other nodes. 2) The constructed graph structures are often constrained to be symmetric, which may limit the model's flexibility and effectiveness. To overcome these limitations, we propose an Uncertainty-aware Graph Structure Learning (UnGSL) strategy. UnGSL estimates the uncertainty of node information and utilizes it to adjust the strength of directional connections, where the influence of nodes with high uncertainty is adaptively reduced.Importantly, UnGSL serves as a plug-in module that can be seamlessly integrated into existing GSL methods with minimal additional computational cost. In our experiments, we implement UnGSL into six representative GSL methods, demonstrating consistent performance improvements. The code is available at https://github.com/UnHans/UnGSL.
PSL: Rethinking and Improving Softmax Loss from Pairwise Perspective for Recommendation
Yang, Weiqin, Chen, Jiawei, Xin, Xin, Zhou, Sheng, Hu, Binbin, Feng, Yan, Chen, Chun, Wang, Can
Softmax Loss (SL) is widely applied in recommender systems (RS) and has demonstrated effectiveness. This work analyzes SL from a pairwise perspective, revealing two significant limitations: 1) the relationship between SL and conventional ranking metrics like DCG is not sufficiently tight; 2) SL is highly sensitive to false negative instances. Our analysis indicates that these limitations are primarily due to the use of the exponential function. To address these issues, this work extends SL to a new family of loss functions, termed Pairwise Softmax Loss (PSL), which replaces the exponential function in SL with other appropriate activation functions. While the revision is minimal, we highlight three merits of PSL: 1) it serves as a tighter surrogate for DCG with suitable activation functions; 2) it better balances data contributions; and 3) it acts as a specific BPR loss enhanced by Distributionally Robust Optimization (DRO).
Plug-and-Play Performance Estimation for LLM Services without Relying on Labeled Data
Wang, Can, Sui, Dianbo, Sun, Hongliang, Ding, Hao, Zhang, Bolin, Tu, Zhiying
However, the success of ICL varies depending on the task and context, leading to heterogeneous service quality. Directly estimating the performance of LLM services at each invocation can be laborious, especially requiring abundant labeled data or internal information within the LLM. This paper introduces a novel method to estimate the performance of LLM services across different tasks and contexts, which can be "plug-and-play" utilizing only a few unlabeled samples like ICL. Our findings suggest that the negative log-likelihood and perplexity derived from LLM service invocation can function as effective and significant features. Based on these features, we utilize four distinct meta-models to estimate the performance of LLM services. Our proposed method is compared against unlabeled estimation baselines across multiple LLM services and tasks. And it is experimentally applied to two scenarios, demonstrating its effectiveness in the selection and further optimization of LLM services.
Conditional Image Synthesis with Diffusion Models: A Survey
Zhan, Zheyuan, Chen, Defang, Mei, Jian-Ping, Zhao, Zhenghe, Chen, Jiawei, Chen, Chun, Lyu, Siwei, Wang, Can
Conditional image synthesis based on user-specified requirements is a key component in creating complex visual content. In recent years, diffusion-based generative modeling has become a highly effective way for conditional image synthesis, leading to exponential growth in the literature. However, the complexity of diffusion-based modeling, the wide range of image synthesis tasks, and the diversity of conditioning mechanisms present significant challenges for researchers to keep up with rapid developments and understand the core concepts on this topic. In this survey, we categorize existing works based on how conditions are integrated into the two fundamental components of diffusion-based modeling, i.e., the denoising network and the sampling process. We specifically highlight the underlying principles, advantages, and potential challenges of various conditioning approaches in the training, re-purposing, and specialization stages to construct a desired denoising network. We also summarize six mainstream conditioning mechanisms in the essential sampling process. All discussions are centered around popular applications. Finally, we pinpoint some critical yet still open problems to be solved in the future and suggest some possible solutions. Our reviewed works are itemized at https://github.com/zju-pi/Awesome-Conditional-Diffusion-Models.
Towards Dynamic Graph Neural Networks with Provably High-Order Expressive Power
Wang, Zhe, Zhao, Tianjian, Zhang, Zhen, Chen, Jiawei, Zhou, Sheng, Feng, Yan, Chen, Chun, Wang, Can
Dynamic Graph Neural Networks (DyGNNs) have garnered increasing research attention for learning representations on evolving graphs. Despite their effectiveness, the limited expressive power of existing DyGNNs hinders them from capturing important evolving patterns of dynamic graphs. Although some works attempt to enhance expressive capability with heuristic features, there remains a lack of DyGNN frameworks with provable and quantifiable high-order expressive power. To address this research gap, we firstly propose the k-dimensional Dynamic WL tests (k-DWL) as the referencing algorithms to quantify the expressive power of DyGNNs. We demonstrate that the expressive power of existing DyGNNs is upper bounded by the 1-DWL test. To enhance the expressive power, we propose Dynamic Graph Neural Network with High-order expressive power (HopeDGN), which updates the representation of central node pair by aggregating the interaction history with neighboring node pairs. Our theoretical results demonstrate that HopeDGN can achieve expressive power equivalent to the 2-DWL test. We then present a Transformer-based implementation for the local variant of HopeDGN. Experimental results show that HopeDGN achieved performance improvements of up to 3.12%, demonstrating the effectiveness of HopeDGN.
Generative Object Insertion in Gaussian Splatting with a Multi-View Diffusion Model
Zhong, Hongliang, Wang, Can, Zhang, Jingbo, Liao, Jing
Generating and inserting new objects into 3D content is a compelling approach for achieving versatile scene recreation. Existing methods, which rely on SDS optimization or single-view inpainting, often struggle to produce high-quality results. To address this, we propose a novel method for object insertion in 3D content represented by Gaussian Splatting. Our approach introduces a multi-view diffusion model, dubbed MVInpainter, which is built upon a pre-trained stable video diffusion model to facilitate view-consistent object inpainting. Within MVInpainter, we incorporate a ControlNet-based conditional injection module to enable controlled and more predictable multi-view generation. After generating the multi-view inpainted results, we further propose a mask-aware 3D reconstruction technique to refine Gaussian Splatting reconstruction from these sparse inpainted views. By leveraging these fabricate techniques, our approach yields diverse results, ensures view-consistent and harmonious insertions, and produces better object quality. Extensive experiments demonstrate that our approach outperforms existing methods.
Generating 6-D Trajectories for Omnidirectional Multirotor Aerial Vehicles in Cluttered Environments
Liu, Peiyan, Shen, Yuanzhe, Liu, Yueqian, Quan, Fengyu, Wang, Can, Chen, Haoyao
As fully-actuated systems, omnidirectional multirotor aerial vehicles (OMAVs) have more flexible maneuverability and advantages in aggressive flight in cluttered environments than traditional underactuated MAVs. %Due to the high dimensionality of configuration space, making the designed trajectory generation algorithm efficient is challenging. This paper aims to achieve safe flight of OMAVs in cluttered environments. Considering existing static obstacles, an efficient optimization-based framework is proposed to generate 6-D $SE(3)$ trajectories for OMAVs. Given the kinodynamic constraints and the 3D collision-free region represented by a series of intersecting convex polyhedra, the proposed method finally generates a safe and dynamically feasible 6-D trajectory. First, we parameterize the vehicle's attitude into a free 3D vector using stereographic projection to eliminate the constraints inherent in the $SO(3)$ manifold, while the complete $SE(3)$ trajectory is represented as a 6-D polynomial in time without inherent constraints. The vehicle's shape is modeled as a cuboid attached to the body frame to achieve whole-body collision evaluation. Then, we formulate the origin trajectory generation problem as a constrained optimization problem. The original constrained problem is finally transformed into an unconstrained one that can be solved efficiently. To verify the proposed framework's performance, simulations and real-world experiments based on a tilt-rotor hexarotor aerial vehicle are carried out.
Motif-driven Subgraph Structure Learning for Graph Classification
Zhou, Zhiyao, Zhou, Sheng, Mao, Bochao, Chen, Jiawei, Sun, Qingyun, Feng, Yan, Chen, Chun, Wang, Can
To mitigate the suboptimal nature of graph structure, Graph Structure Learning (GSL) has emerged as a promising approach to improve graph structure and boost performance in downstream tasks. Despite the proposal of numerous GSL methods, the progresses in this field mostly concentrated on node-level tasks, while graph-level tasks (e.g., graph classification) remain largely unexplored. Notably, applying node-level GSL to graph classification is non-trivial due to the lack of find-grained guidance for intricate structure learning. Inspired by the vital role of subgraph in graph classification, in this paper we explore the potential of subgraph structure learning for graph classification by tackling the challenges of key subgraph selection and structure optimization. We propose a novel Motif-driven Subgraph Structure Learning method for Graph Classification (MOSGSL). Specifically, MOSGSL incorporates a subgraph structure learning module which can adaptively select important subgraphs. A motif-driven structure guidance module is further introduced to capture key subgraph-level structural patterns (motifs) and facilitate personalized structure learning. Extensive experiments demonstrate a significant and consistent improvement over baselines, as well as its flexibility and generalizability for various backbones and learning procedures.
How Do Recommendation Models Amplify Popularity Bias? An Analysis from the Spectral Perspective
Lin, Siyi, Gao, Chongming, Chen, Jiawei, Zhou, Sheng, Hu, Binbin, Feng, Yan, Chen, Chun, Wang, Can
Recommendation Systems (RS) are often plagued by popularity bias. When training a recommendation model on a typically long-tailed dataset, the model tends to not only inherit this bias but often exacerbate it, resulting in over-representation of popular items in the recommendation lists. This study conducts comprehensive empirical and theoretical analyses to expose the root causes of this phenomenon, yielding two core insights: 1) Item popularity is memorized in the principal spectrum of the score matrix predicted by the recommendation model; 2) The dimension collapse phenomenon amplifies the relative prominence of the principal spectrum, thereby intensifying the popularity bias. Building on these insights, we propose a novel debiasing strategy that leverages a spectral norm regularizer to penalize the magnitude of the principal singular value. We have developed an efficient algorithm to expedite the calculation of the spectral norm by exploiting the spectral property of the score matrix. Extensive experiments across seven real-world datasets and three testing paradigms have been conducted to validate the superiority of the proposed method.