Liu, Bo
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance
Deng, Yufan, Guo, Xun, Wang, Yizhi, Fang, Jacob Zhiyuan, Wang, Angtian, Yuan, Shenghai, Yang, Yiding, Liu, Bo, Huang, Haibin, Ma, Chongyang
Video generation has witnessed remarkable progress with the advent of deep generative models, particularly diffusion models. While existing methods excel in generating high-quality videos from text prompts or single images, personalized multi-subject video generation remains a largely unexplored challenge. This task involves synthesizing videos that incorporate multiple distinct subjects, each defined by separate reference images, while ensuring temporal and spatial consistency. Current approaches primarily rely on mapping subject images to keywords in text prompts, which introduces ambiguity and limits their ability to model subject relationships effectively. In this paper, we propose CINEMA, a novel framework for coherent multi-subject video generation by leveraging Multimodal Large Language Model (MLLM). Our approach eliminates the need for explicit correspondences between subject images and text entities, mitigating ambiguity and reducing annotation effort. By leveraging MLLM to interpret subject relationships, our method facilitates scalability, enabling the use of large and diverse datasets for training. Furthermore, our framework can be conditioned on varying numbers of subjects, offering greater flexibility in personalized content creation. Through extensive evaluations, we demonstrate that our approach significantly improves subject consistency, and overall video coherence, paving the way for advanced applications in storytelling, interactive media, and personalized video generation.
Do Fairness Interventions Come at the Cost of Privacy: Evaluations for Binary Classifiers
Tian, Huan, Zhang, Guangsheng, Liu, Bo, Zhu, Tianqing, Ding, Ming, Zhou, Wanlei
While in-processing fairness approaches show promise in mitigating biased predictions, their potential impact on privacy leakage remains under-explored. We aim to address this gap by assessing the privacy risks of fairness-enhanced binary classifiers via membership inference attacks (MIAs) and attribute inference attacks (AIAs). Surprisingly, our results reveal that enhancing fairness does not necessarily lead to privacy compromises. For example, these fairness interventions exhibit increased resilience against MIAs and AIAs. This is because fairness interventions tend to remove sensitive information among extracted features and reduce confidence scores for the majority of training data for fairer predictions. However, during the evaluations, we uncover a potential threat mechanism that exploits prediction discrepancies between fair and biased models, leading to advanced attack results for both MIAs and AIAs. This mechanism reveals potent vulnerabilities of fair models and poses significant privacy risks of current fairness methods. Extensive experiments across multiple datasets, attack methods, and representative fairness approaches confirm our findings and demonstrate the efficacy of the uncovered mechanism. Our study exposes the under-explored privacy threats in fairness studies, advocating for thorough evaluations of potential security vulnerabilities before model deployments.
Differentiable Information Enhanced Model-Based Reinforcement Learning
Zhang, Xiaoyuan, Cai, Xinyan, Liu, Bo, Huang, Weidong, Zhu, Song-Chun, Qi, Siyuan, Yang, Yaodong
Differentiable environments have heralded new possibilities for learning control policies by offering rich differentiable information that facilitates gradient-based methods. In comparison to prevailing model-free reinforcement learning approaches, model-based reinforcement learning (MBRL) methods exhibit the potential to effectively harness the power of differentiable information for recovering the underlying physical dynamics. However, this presents two primary challenges: effectively utilizing differentiable information to 1) construct models with more accurate dynamic prediction and 2) enhance the stability of policy training. In this paper, we propose a Differentiable Information Enhanced MBRL method, MB-MIX, to address both challenges. Firstly, we adopt a Sobolev model training approach that penalizes incorrect model gradient outputs, enhancing prediction accuracy and yielding more precise models that faithfully capture system dynamics. Secondly, we introduce mixing lengths of truncated learning windows to reduce the variance in policy gradient estimation, resulting in improved stability during policy learning. To validate the effectiveness of our approach in differentiable environments, we provide theoretical analysis and empirical results. Notably, our approach outperforms previous model-based and model-free methods, in multiple challenging tasks involving controllable rigid robots such as humanoid robots' motion control and deformable object manipulation.
Vehicle Top Tag Assisted Vehicle-Road Cooperative Localization For Autonomous Public Buses
Li, Hao, Sun, Yifei, Liu, Bo, Wang, Linbin
V ehicle Top Tag Assisted V ehicle-Road Cooperative Localization For Autonomous Public Buses Hao Li, Yifei Sun, Bo Liu, Linbin Wang Abstract -- Accurate vehicle localization is indispensable to autonomous vehicles, but is difficult to realize in complicated application scenarios. Intersection scenarios that suffer from environmental shielding and crowded dynamic objects are especially crucial and challenging. T o handle difficult intersection scenarios, the methodology of vehicle top tag assisted vehicle-road cooperative localization or for short vehicle top tag assisted localization is proposed. The proposed methodology has merits of satisfying all the feasibility, reliability, explainability, society and economy concerns. Concrete solutions of vehicle top tag detection and vehicle top tag localization that instantiate the core part of the proposed methodology are presented. Simulation results are provided to demonstrate effectiveness of the presented solutions. The proposed methodology of vehicle top tag assisted localization also has the potential to be extended to a much wider range of practical applications than our intended ones involving autonomous public buses. State-of-the-art (SOT A) vehicle localization systems normally rely on certain exteroceptive sensors such as GNSS, LiDAR, and vision system (or camera), augmented by proprioceptive sensors such as IMU. Relevant methods can be mainly categorized into GNSS based ones, LiDAR based ones, and vision based ones. These categories of vehicle localization methods are not mutually exclusive.
Improving Adaptive Moment Optimization via Preconditioner Diagonalization
Nguyen, Son, Liu, Bo, Chen, Lizhang, Liu, Qiang
Modern adaptive optimization methods, such as Adam and its variants, have emerged as the most widely used tools in deep learning over recent years. These algorithms offer automatic mechanisms for dynamically adjusting the update step based on estimates of gradient statistics. Compared to traditional algorithms like Stochastic Gradient Descent, these adaptive methods are typically more robust to model scale and hyperparameter tuning. However, the gradient statistics employed by these methods often do not leverage sufficient gradient covariance information, leading to suboptimal updates in certain directions of the parameter space and potentially slower convergence. In this work, we keep track of such covariance statistics in the form of a structured preconditioner matrix. Unlike other works, our approach does not apply direct approximations to estimate this matrix. We instead implement an invertible transformation that maps the preconditioner matrix into a new space where it becomes approximately diagonal. This enables a diagonal approximation of the preconditioner matrix in the transformed space, offering several computational advantages. Empirical results show that our approach can substantially enhance the convergence speed of modern adaptive optimizers. Notably, for large language models like LLaMA, we can achieve a speedup of 2x compared to the baseline Adam. Additionally, our method can be integrated with memory-efficient optimizers like Adafactor to manage computational overhead.
Multi-Class Segmentation of Aortic Branches and Zones in Computed Tomography Angiography: The AortaSeg24 Challenge
Imran, Muhammad, Krebs, Jonathan R., Sivaraman, Vishal Balaji, Zhang, Teng, Kumar, Amarjeet, Ueland, Walker R., Fassler, Michael J., Huang, Jinlong, Sun, Xiao, Wang, Lisheng, Shi, Pengcheng, Rokuss, Maximilian, Baumgartner, Michael, Kirchhof, Yannick, Maier-Hein, Klaus H., Isensee, Fabian, Liu, Shuolin, Han, Bing, Nguyen, Bong Thanh, Shin, Dong-jin, Ji-Woo, Park, Choi, Mathew, Uhm, Kwang-Hyun, Ko, Sung-Jea, Lee, Chanwoong, Chun, Jaehee, Kim, Jin Sung, Zhang, Minghui, Zhang, Hanxiao, You, Xin, Gu, Yun, Pan, Zhaohong, Liu, Xuan, Liang, Xiaokun, Tiefenthaler, Markus, Almar-Munoz, Enrique, Schwab, Matthias, Kotyushev, Mikhail, Epifanov, Rostislav, Wodzinski, Marek, Muller, Henning, Qayyum, Abdul, Mazher, Moona, Niederer, Steven A., Wang, Zhiwei, Yang, Kaixiang, Ren, Jintao, Korreman, Stine Sofia, Gao, Yuchong, Zeng, Hongye, Zheng, Haoyu, Zheng, Rui, Yue, Jinghua, Zhou, Fugen, Liu, Bo, Cosman, Alexander, Liang, Muxuan, Zhao, Chang, Upchurch, Gilbert R. Jr., Ma, Jun, Zhou, Yuyin, Cooper, Michol A., Shao, Wei
Multi-class segmentation of the aorta in computed tomography angiography (CTA) scans is essential for diagnosing and planning complex endovascular treatments for patients with aortic dissections. However, existing methods reduce aortic segmentation to a binary problem, limiting their ability to measure diameters across different branches and zones. Furthermore, no open-source dataset is currently available to support the development of multi-class aortic segmentation methods. To address this gap, we organized the AortaSeg24 MICCAI Challenge, introducing the first dataset of 100 CTA volumes annotated for 23 clinically relevant aortic branches and zones. This dataset was designed to facilitate both model development and validation. The challenge attracted 121 teams worldwide, with participants leveraging state-of-the-art frameworks such as nnU-Net and exploring novel techniques, including cascaded models, data augmentation strategies, and custom loss functions. We evaluated the submitted algorithms using the Dice Similarity Coefficient (DSC) and Normalized Surface Distance (NSD), highlighting the approaches adopted by the top five performing teams. This paper presents the challenge design, dataset details, evaluation metrics, and an in-depth analysis of the top-performing algorithms. The annotated dataset, evaluation code, and implementations of the leading methods are publicly available to support further research. All resources can be accessed at https://aortaseg24.grand-challenge.org.
Data-Free Model-Related Attacks: Unleashing the Potential of Generative AI
Ye, Dayong, Zhu, Tianqing, Wang, Shang, Liu, Bo, Zhang, Leo Yu, Zhou, Wanlei, Zhang, Yang
Generative AI technology has become increasingly integrated into our daily lives, offering powerful capabilities to enhance productivity. However, these same capabilities can be exploited by adversaries for malicious purposes. While existing research on adversarial applications of generative AI predominantly focuses on cyberattacks, less attention has been given to attacks targeting deep learning models. In this paper, we introduce the use of generative AI for facilitating model-related attacks, including model extraction, membership inference, and model inversion. Our study reveals that adversaries can launch a variety of model-related attacks against both image and text models in a data-free and black-box manner, achieving comparable performance to baseline methods that have access to the target models' training data and parameters in a white-box manner. This research serves as an important early warning to the community about the potential risks associated with generative AI-powered attacks on deep learning models.
Data Duplication: A Novel Multi-Purpose Attack Paradigm in Machine Unlearning
Ye, Dayong, Zhu, Tainqing, Li, Jiayang, Gao, Kun, Liu, Bo, Zhang, Leo Yu, Zhou, Wanlei, Zhang, Yang
Duplication is a prevalent issue within datasets. Existing research has demonstrated that the presence of duplicated data in training datasets can significantly influence both model performance and data privacy. However, the impact of data duplication on the unlearning process remains largely unexplored. This paper addresses this gap by pioneering a comprehensive investigation into the role of data duplication, not only in standard machine unlearning but also in federated and reinforcement unlearning paradigms. Specifically, we propose an adversary who duplicates a subset of the target model's training set and incorporates it into the training set. After training, the adversary requests the model owner to unlearn this duplicated subset, and analyzes the impact on the unlearned model. For example, the adversary can challenge the model owner by revealing that, despite efforts to unlearn it, the influence of the duplicated subset remains in the model. Moreover, to circumvent detection by de-duplication techniques, we propose three novel near-duplication methods for the adversary, each tailored to a specific unlearning paradigm. We then examine their impacts on the unlearning process when de-duplication techniques are applied. Our findings reveal several crucial insights: 1) the gold standard unlearning method, retraining from scratch, fails to effectively conduct unlearning under certain conditions; 2) unlearning duplicated data can lead to significant model degradation in specific scenarios; and 3) meticulously crafted duplicates can evade detection by de-duplication methods.
TAD-Bench: A Comprehensive Benchmark for Embedding-Based Text Anomaly Detection
Cao, Yang, Yang, Sikun, Li, Chen, Xiang, Haolong, Qi, Lianyong, Liu, Bo, Li, Rongsheng, Liu, Ming
Existing studies often lack Anomaly detection is a critical task in machine systematic evaluations of how different embeddings learning, with applications ranging from fraud detection perform across diverse anomaly types, raising and content moderation to user behavior questions about their generalization capabilities analysis (Pang et al., 2021). Within natural language in complex, real-world scenarios such as multilingual processing (NLP), anomaly detection has become settings or domain-specific anomalies. Recent increasingly relevant for identifying outliers efforts, such as AD-NLP (Bejan et al., 2023) such as harmful content, phishing attempts, and and NLP-ADBench (Li et al., 2024), have significantly spam reviews. However, while AD tasks in structured advanced anomaly detection in NLP. ADdata (e.g., tabular, time series, graphs) (Steinbuss NLP provides valuable insights into different types and Böhm, 2021; Blázquez-García et al., 2021; of anomalies, while NLP-ADBench expands evaluations Qiao et al., 2024) have achieved significant maturity, to a wide range of algorithms and datasets.
AI-based Identity Fraud Detection: A Systematic Review
Zhang, Chuo Jun, Gill, Asif Q., Liu, Bo, Anwar, Memoona J.
With the rapid development of digital services, a large volume of personally identifiable information (PII) is stored online and is subject to cyberattacks such as Identity fraud. Most recently, the use of Artificial Intelligence (AI) enabled deep fake technologies has significantly increased the complexity of identity fraud. Fraudsters may use these technologies to create highly sophisticated counterfeit personal identification documents, photos and videos. These advancements in the identity fraud landscape pose challenges for identity fraud detection and society at large. There is a pressing need to review and understand identity fraud detection methods, their limitations and potential solutions. This research aims to address this important need by using the well-known systematic literature review method. This paper reviewed a selected set of 43 papers across 4 major academic literature databases. In particular, the review results highlight the two types of identity fraud prevention and detection methods, in-depth and open challenges. The results were also consolidated into a taxonomy of AI-based identity fraud detection and prevention methods including key insights and trends. Overall, this paper provides a foundational knowledge base to researchers and practitioners for further research and development in this important area of digital identity fraud.