Tan, Zhaorui
ScaleOT: Privacy-utility-scalable Offsite-tuning with Dynamic LayerReplace and Selective Rank Compression
Yao, Kai, Tan, Zhaorui, Ye, Tiandi, Li, Lichun, Zhao, Yuan, Liu, Wenyan, Wang, Wei, Zhu, Jianke
Offsite-tuning is a privacy-preserving method for tuning large language models (LLMs) by sharing a lossy compressed emulator from the LLM owners with data owners for downstream task tuning. This approach protects the privacy of both the model and data owners. However, current offsite tuning methods often suffer from adaptation degradation, high computational costs, and limited protection strength due to uniformly dropping LLM layers or relying on expensive knowledge distillation. To address these issues, we propose ScaleOT, a novel privacy-utility-scalable offsite-tuning framework that effectively balances privacy and utility. ScaleOT introduces a novel layerwise lossy compression algorithm that uses reinforcement learning to obtain the importance of each layer. It employs lightweight networks, termed harmonizers, to replace the raw LLM layers. By combining important original LLM layers and harmonizers in different ratios, ScaleOT generates emulators tailored for optimal performance with various model scales for enhanced privacy protection. Additionally, we present a rank reduction method to further compress the original LLM layers, significantly enhancing privacy with negligible impact on utility. Comprehensive experiments show that ScaleOT can achieve nearly lossless offsite tuning performance compared with full fine-tuning while obtaining better model privacy.
Personalize to generalize: Towards a universal medical multi-modality generalization through personalization
Tan, Zhaorui, Yang, Xi, Pan, Tan, Liu, Tianyi, Jiang, Chen, Guo, Xin, Wang, Qiufeng, Nguyen, Anh, Qi, Yuan, Huang, Kaizhu, Cheng, Yuan
The differences among medical imaging modalities, driven by distinct underlying principles, pose significant challenges for generalization in multi-modal medical tasks. Beyond modality gaps, individual variations, such as differences in organ size and metabolic rate, further impede a model's ability to generalize effectively across both modalities and diverse populations. Despite the importance of personalization, existing approaches to multi-modal generalization often neglect individual differences, focusing solely on common anatomical features. This limitation may result in weakened generalization in various medical tasks. In this paper, we unveil that personalization is critical for multi-modal generalization. Specifically, we propose an approach to achieve personalized generalization through approximating the underlying personalized invariant representation ${X}_h$ across various modalities by leveraging individual-level constraints and a learnable biological prior. We validate the feasibility and benefits of learning a personalized ${X}_h$, showing that this representation is highly generalizable and transferable across various multi-modal medical tasks. Extensive experimental results consistently show that the additionally incorporated personalization significantly improves performance and generalization across diverse scenarios, confirming its effectiveness.
Interpret Your Decision: Logical Reasoning Regularization for Generalization in Visual Classification
Tan, Zhaorui, Yang, Xi, Wang, Qiufeng, Nguyen, Anh, Huang, Kaizhu
Vision models excel in image classification but struggle to generalize to unseen data, such as classifying images from unseen domains or discovering novel categories. In this paper, we explore the relationship between logical reasoning and deep learning generalization in visual classification. A logical regularization termed L-Reg is derived which bridges a logical analysis framework to image classification. Our work reveals that L-Reg reduces the complexity of the model in terms of the feature distribution and classifier weights. Specifically, we unveil the interpretability brought by L-Reg, as it enables the model to extract the salient features, such as faces to persons, for classification. Theoretical analysis and experiments demonstrate that L-Reg enhances generalization across various scenarios, including multi-domain generalization and generalized category discovery. In complex real-world scenarios where images span unknown classes and unseen domains, L-Reg consistently improves generalization, highlighting its practical efficacy.
Covariance-based Space Regularization for Few-shot Class Incremental Learning
Hu, Yijie, Yang, Guanyu, Tan, Zhaorui, Huang, Xiaowei, Huang, Kaizhu, Wang, Qiu-Feng
Few-shot Class Incremental Learning (FSCIL) presents a challenging yet realistic scenario, which requires the model to continually learn new classes with limited labeled data (i.e., incremental sessions) while retaining knowledge of previously learned base classes (i.e., base sessions). Due to the limited data in incremental sessions, models are prone to overfitting new classes and suffering catastrophic forgetting of base classes. To tackle these issues, recent advancements resort to prototype-based approaches to constrain the base class distribution and learn discriminative representations of new classes. Despite the progress, the limited data issue still induces ill-divided feature space, leading the model to confuse the new class with old classes or fail to facilitate good separation among new classes. In this paper, we aim to mitigate these issues by directly constraining the span of each class distribution from a covariance perspective. In detail, we propose a simple yet effective covariance constraint loss to force the model to learn each class distribution with the same covariance matrix. In addition, we propose a perturbation approach to perturb the few-shot training samples in the feature space, which encourages the samples to be away from the weighted distribution of other classes. Regarding perturbed samples as new class data, the classifier is forced to establish explicit boundaries between each new class and the existing ones. Our approach is easy to integrate into existing FSCIL approaches to boost performance. Experiments on three benchmarks validate the effectiveness of our approach, achieving a new state-of-the-art performance of FSCIL.
Mind the Gap: Promoting Missing Modality Brain Tumor Segmentation with Alignment
Liu, Tianyi, Tan, Zhaorui, Jiang, Haochuan, Yang, Xi, Huang, Kaizhu
Brain tumor segmentation is often based on multiple magnetic resonance imaging (MRI). However, in clinical practice, certain modalities of MRI may be missing, which presents an even more difficult scenario. To cope with this challenge, knowledge distillation has emerged as one promising strategy. However, recent efforts typically overlook the modality gaps and thus fail to learn invariant feature representations across different modalities. Such drawback consequently leads to limited performance for both teachers and students. To ameliorate these problems, in this paper, we propose a novel paradigm that aligns latent features of involved modalities to a well-defined distribution anchor. As a major contribution, we prove that our novel training paradigm ensures a tight evidence lower bound, thus theoretically certifying its effectiveness. Extensive experiments on different backbones validate that the proposed paradigm can enable invariant feature representations and produce a teacher with narrowed modality gaps. This further offers superior guidance for missing modality students, achieving an average improvement of 1.75 on dice score.
MedMAP: Promoting Incomplete Multi-modal Brain Tumor Segmentation with Alignment
Liu, Tianyi, Tan, Zhaorui, Chen, Muyin, Yang, Xi, Jiang, Haochuan, Huang, Kaizhu
Brain tumor segmentation is often based on multiple magnetic resonance imaging (MRI). However, in clinical practice, certain modalities of MRI may be missing, which presents a more difficult scenario. To cope with this challenge, Knowledge Distillation, Domain Adaption, and Shared Latent Space have emerged as commonly promising strategies. However, recent efforts typically overlook the modality gaps and thus fail to learn important invariant feature representations across different modalities. Such drawback consequently leads to limited performance for missing modality models. To ameliorate these problems, pre-trained models are used in natural visual segmentation tasks to minimize the gaps. However, promising pre-trained models are often unavailable in medical image segmentation tasks. Along this line, in this paper, we propose a novel paradigm that aligns latent features of involved modalities to a well-defined distribution anchor as the substitution of the pre-trained model}. As a major contribution, we prove that our novel training paradigm ensures a tight evidence lower bound, thus theoretically certifying its effectiveness. Extensive experiments on different backbones validate that the proposed paradigm can enable invariant feature representations and produce models with narrowed modality gaps. Models with our alignment paradigm show their superior performance on both BraTS2018 and BraTS2020 datasets.
Rethinking Information Loss in Medical Image Segmentation with Various-sized Targets
Liu, Tianyi, Tan, Zhaorui, Huang, Kaizhu, Jiang, Haochuan
Medical image segmentation presents the challenge of segmenting various-size targets, demanding the model to effectively capture both local and global information. Despite recent efforts using CNNs and ViTs to predict annotations of different scales, these approaches often struggle to effectively balance the detection of targets across varying sizes. Simply utilizing local information from CNNs and global relationships from ViTs without considering potential significant divergence in latent feature distributions may result in substantial information loss. To address this issue, in this paper, we will introduce a novel Stagger Network (SNet) and argues that a well-designed fusion structure can mitigate the divergence in latent feature distributions between CNNs and ViTs, thereby reducing information loss. Specifically, to emphasize both global dependencies and local focus, we design a Parallel Module to bridge the semantic gap. Meanwhile, we propose the Stagger Module, trying to fuse the selected features that are more semantically similar. An Information Recovery Module is further adopted to recover complementary information back to the network. As a key contribution, we theoretically analyze that the proposed parallel and stagger strategies would lead to less information loss, thus certifying the SNet's rationale. Experimental results clearly proved that the proposed SNet excels comparisons with recent SOTAs in segmenting on the Synapse dataset where targets are in various sizes. Besides, it also demonstrates superiority on the ACDC and the MoNuSeg datasets where targets are with more consistent dimensions.
Rethinking Multi-domain Generalization with A General Learning Objective
Tan, Zhaorui, Yang, Xi, Huang, Kaizhu
Multi-domain generalization (mDG) is universally aimed to minimize the discrepancy between training and testing distributions to enhance marginal-to-label distribution mapping. However, existing mDG literature lacks a general learning objective paradigm and often imposes constraints on static target marginal distributions. In this paper, we propose to leverage a $Y$-mapping to relax the constraint. We rethink the learning objective for mDG and design a new \textbf{general learning objective} to interpret and analyze most existing mDG wisdom. This general objective is bifurcated into two synergistic amis: learning domain-independent conditional features and maximizing a posterior. Explorations also extend to two effective regularization terms that incorporate prior information and suppress invalid causality, alleviating the issues that come with relaxed constraints. We theoretically contribute an upper bound for the domain alignment of domain-independent conditional features, disclosing that many previous mDG endeavors actually \textbf{optimize partially the objective} and thus lead to limited performance. As such, our study distills a general learning objective into four practical components, providing a general, robust, and flexible mechanism to handle complex domain shifts. Extensive empirical results indicate that the proposed objective with $Y$-mapping leads to substantially better mDG performance in various downstream tasks, including regression, segmentation, and classification.