Desrosiers, Christian
FAA-CLIP: Federated Adversarial Adaptation of CLIP
Wu, Yihang, Chaddad, Ahmad, Desrosiers, Christian, Daqqaq, Tareef, Kateb, Reem
--Despite the remarkable performance of vision language models (VLMs) such as Contrastive Language Image Pre-training (CLIP), the large size of these models is a considerable obstacle to their use in federated learning (FL) systems where the parameters of local client models need to be transferred to a global server for aggregation. Another challenge in FL is the heterogeneity of data from different clients, which affects the generalization performance of the solution. In addition, natural pre-trained VLMs exhibit poor generalization ability in the medical datasets, suggests there exists a domain gap. T o solve these issues, we introduce a novel method for the Federated Adversarial Adaptation (F AA) of CLIP . Our method, named F AA-CLIP, handles the large communication costs of CLIP using a light-weight feature adaptation module (F AM) for aggregation, effectively adapting this VLM to each client's data while greatly reducing the number of parameters to transfer . By keeping CLIP frozen and only updating the F AM parameters, our method is also computationally efficient. Unlike existing approaches, our F AA-CLIP method directly addresses the problem of domain shifts across clients via a domain adaptation (DA) module. This module employs a domain classifier to predict if a given sample is from the local client or the global server, allowing the model to learn domain-invariant representations. Extensive experiments on six different datasets containing both natural and medical images demonstrate that F AA-CLIP can generalize well on both natural and medical datasets compared to recent FL approaches. Our codes are available at https://github.com/AIPMLab/F While models based on deep learning (DL) have achieved ground-breaking results in a broad range of computer vision and natural language understanding tasks, their performance is often dependent on the availability of large datasets [1]. In recent years, there has been a growing concern on ensuring data privacy and security, with many organizations implementing regulations and laws such as the EU General Data Protection Regulation (GDPR) [2]. These restrictions on sharing raw data from different organizations poses a siginificant challenge for training robust DL models in fields like medical imaging where privacy is of utmost importance. One of the most promising solutions to this problem is federated learning (FL).
Simulations of Common Unsupervised Domain Adaptation Algorithms for Image Classification
Chaddad, Ahmad, Wu, Yihang, Jiang, Yuchen, Bouridane, Ahmed, Desrosiers, Christian
Traditional machine learning assumes that training and test sets are derived from the same distribution; however, this assumption does not always hold in practical applications. This distribution disparity can lead to severe performance drops when the trained model is used in new data sets. Domain adaptation (DA) is a machine learning technique that aims to address this problem by reducing the differences between domains. This paper presents simulation-based algorithms of recent DA techniques, mainly related to unsupervised domain adaptation (UDA), where labels are available only in the source domain. Our study compares these techniques with public data sets and diverse characteristics, highlighting their respective strengths and drawbacks. For example, Safe Self-Refinement for Transformer-based DA (SSRT) achieved the highest accuracy (91.6\%) in the office-31 data set during our simulations, however, the accuracy dropped to 72.4\% in the Office-Home data set when using limited batch sizes. In addition to improving the reader's comprehension of recent techniques in DA, our study also highlights challenges and upcoming directions for research in this domain. The codes are available at https://github.com/AIPMLab/Domain_Adaptation.
ReC-TTT: Contrastive Feature Reconstruction for Test-Time Training
Colussi, Marco, Mascetti, Sergio, Dolz, Jose, Desrosiers, Christian
The remarkable progress in deep learning (DL) showcases outstanding results in various computer vision tasks. However, adaptation to real-time variations in data distributions remains an important challenge. Test-Time Training (TTT) was proposed as an effective solution to this issue, which increases the generalization ability of trained models by adding an auxiliary task at train time and then using its loss at test time to adapt the model. Inspired by the recent achievements of contrastive representation learning in unsupervised tasks, we propose ReC-TTT, a test-time training technique that can adapt a DL model to new unseen domains by generating discriminative views of the input data. ReC-TTT uses cross-reconstruction as an auxiliary task between a frozen encoder and two trainable encoders, taking advantage of a single shared decoder. This enables, at test time, to adapt the encoders to extract features that will be correctly reconstructed by the decoder that, in this phase, is frozen on the source domain. Experimental results show that ReC-TTT achieves better results than other state-of-the-art techniques in most domain shift classification challenges.
FACMIC: Federated Adaptative CLIP Model for Medical Image Classification
Wu, Yihang, Desrosiers, Christian, Chaddad, Ahmad
Federated learning (FL) has emerged as a promising approach to medical image analysis that allows deep model training using decentralized data while ensuring data privacy. However, in the field of FL, communication cost plays a critical role in evaluating the performance of the model. Thus, transferring vision foundation models can be particularly challenging due to the significant resource costs involved. In this paper, we introduce a federated adaptive Contrastive Language Image Pretraining (CLIP) model designed for classification tasks. We employ a light-weight and efficient feature attention module for CLIP that selects suitable features for each client's data. Additionally, we propose a domain adaptation technique to reduce differences in data distribution between clients. Experimental results on four publicly available datasets demonstrate the superior performance of FACMIC in dealing with realworld and multisource medical imaging data. Our codes are available at https://github.com/AIPMLab/FACMIC.
GeoMask3D: Geometrically Informed Mask Selection for Self-Supervised Point Cloud Learning in 3D
Bahri, Ali, Yazdanpanah, Moslem, Noori, Mehrdad, Cheraghalikhani, Milad, Hakim, Gustavo Adolfo Vargas, Osowiechi, David, Beizaee, Farzad, Ayed, Ismail Ben, Desrosiers, Christian
We introduce a pioneering approach to self-supervised learning for point clouds, employing a geometrically informed mask selection strategy called GeoMask3D (GM3D) to boost the efficiency of Masked Auto Encoders (MAE). Unlike the conventional method of random masking, our technique utilizes a teacher-student model to focus on intricate areas within the data, guiding the model's focus toward regions with higher geometric complexity. This strategy is grounded in the hypothesis that concentrating on harder patches yields a more robust feature representation, as evidenced by the improved performance on downstream tasks. Our method also presents a complete-to-partial feature-level knowledge distillation technique designed to guide the prediction of geometric complexity utilizing a comprehensive context from feature-level information. Extensive experiments confirm our method's superiority over State-Of-The-Art (SOTA) baselines, demonstrating marked improvements in classification, and few-shot tasks.
CLIPArTT: Light-weight Adaptation of CLIP to New Domains at Test Time
Hakim, Gustavo Adolfo Vargas, Osowiechi, David, Noori, Mehrdad, Cheraghalikhani, Milad, Bahri, Ali, Yazdanpanah, Moslem, Ayed, Ismail Ben, Desrosiers, Christian
Pre-trained vision-language models (VLMs), exemplified by CLIP, demonstrate remarkable adaptability across zero-shot classification tasks without additional training. However, their performance diminishes in the presence of domain shifts. In this study, we introduce CLIP Adaptation duRing Test-Time (CLIPArTT), a fully test-time adaptation (TTA) approach for CLIP, which involves automatic text prompts construction during inference for their use as text supervision. Our method employs a unique, minimally invasive text prompt tuning process, wherein multiple predicted classes are aggregated into a single new text prompt, used as pseudo label to re-classify inputs in a transductive manner. Additionally, we pioneer the standardization of TTA benchmarks (e.g., TENT) in the realm of VLMs. Our findings demonstrate that, without requiring additional transformations nor new trainable modules, CLIPArTT enhances performance dynamically across non-corrupted datasets such as CIFAR-10, corrupted datasets like CIFAR-10-C and CIFAR-10.1, alongside synthetic datasets such as VisDA-C. This research underscores the potential for improving VLMs' adaptability through novel test-time strategies, offering insights for robust performance across varied datasets and environments. The code can be found at: https://github.com/dosowiechi/CLIPArTT.git
NC-TTT: A Noise Contrastive Approach for Test-Time Training
Osowiechi, David, Hakim, Gustavo A. Vargas, Noori, Mehrdad, Cheraghalikhani, Milad, Bahri, Ali, Yazdanpanah, Moslem, Ayed, Ismail Ben, Desrosiers, Christian
A crucial requirement for the success of traditional deep learning methods is that training and testing data should be sampled from the same distribution. As widely shown in the literature Recht et al. [2018], Peng et al. [2018], this assumption rarely holds in practice, and a model's performance can drop dramatically in the presence of domain shifts. The field of Domain Adaptation (DA) has emerged to address this important issue, proposing various mechanisms that adapt learning algorithms to new domains. In the realm of domain adaptation, two notable directions of research have surfaced: Domain Generalization and Test-Time Adaptation. Domain Generalization (DG) approaches Volpi et al. [2018], Prakash et al. [2019], Zhou et al. [2020], Kim et al. [2022], Wang et al. [2022] typically train a model with an extensive source dataset encompassing diverse domains and augmentations, so that it can achieve a good performance on test examples from unseen domains, without retraining. Conversely, Test-Time Adaptation (TTA) Wang et al. [2021], Khurana et al. [2021], Boudiaf et al. [2022] entails the dynamic adjustment of the model to test data in real-time, typically adapting to subsets of the new domain, such as mini-batches. TTA presents a challenging, yet practical problem as it functions without supervision for test samples or access to the source domain data.
ClusT3: Information Invariant Test-Time Training
Hakim, Gustavo A. Vargas, Osowiechi, David, Noori, Mehrdad, Cheraghalikhani, Milad, Ayed, Ismail Ben, Desrosiers, Christian
Deep Learning models have shown remarkable performance in a broad range of vision tasks. However, they are often vulnerable against domain shifts at test-time. Test-time training (TTT) methods have been developed in an attempt to mitigate these vulnerabilities, where a secondary task is solved at training time simultaneously with the main task, to be later used as an self-supervised proxy task at test-time. In this work, we propose a novel unsupervised TTT technique based on the maximization of Mutual Information between multi-scale feature maps and a discrete latent representation, which can be integrated to the standard training as an auxiliary clustering task. Experimental results demonstrate competitive classification performance on different popular test-time adaptation benchmarks.
MoP-CLIP: A Mixture of Prompt-Tuned CLIP Models for Domain Incremental Learning
Nicolas, Julien, Chiaroni, Florent, Ziko, Imtiaz, Ahmad, Ola, Desrosiers, Christian, Dolz, Jose
Despite the recent progress in incremental learning, addressing catastrophic forgetting under distributional drift is still an open and important problem. Indeed, while state-of-the-art domain incremental learning (DIL) methods perform satisfactorily within known domains, their performance largely degrades in the presence of novel domains. This limitation hampers their generalizability, and restricts their scalability to more realistic settings where train and test data are drawn from different distributions. To address these limitations, we present a novel DIL approach based on a mixture of prompt-tuned CLIP models (MoP-CLIP), which generalizes the paradigm of S-Prompting to handle both in-distribution and out-of-distribution data at inference. In particular, at the training stage we model the features distribution of every class in each domain, learning individual text and visual prompts to adapt to a given domain. At inference, the learned distributions allow us to identify whether a given test sample belongs to a known domain, selecting the correct prompt for the classification task, or from an unseen domain, leveraging a mixture of the prompt-tuned CLIP models. Our empirical evaluation reveals the poor performance of existing DIL methods under domain shift, and suggests that the proposed MoP-CLIP performs competitively in the standard DIL settings while outperforming state-of-the-art methods in OOD scenarios. These results demonstrate the superiority of MoP-CLIP, offering a robust and general solution to the problem of domain incremental learning.
What Matters in Reinforcement Learning for Tractography
Théberge, Antoine, Desrosiers, Christian, Descoteaux, Maxime, Jodoin, Pierre-Marc
Recently, deep reinforcement learning (RL) has been proposed to learn the tractography procedure and train agents to reconstruct the structure of the white matter without manually curated reference streamlines. While the performances reported were competitive, the proposed framework is complex, and little is still known about the role and impact of its multiple parts. In this work, we thoroughly explore the different components of the proposed framework, such as the choice of the RL algorithm, seeding strategy, the input signal and reward function, and shed light on their impact. Approximately 7,400 models were trained for this work, totalling nearly 41,000 hours of GPU time. Our goal is to guide researchers eager to explore the possibilities of deep RL for tractography by exposing what works and what does not work with the category of approach. As such, we ultimately propose a series of recommendations concerning the choice of RL algorithm, the input to the agents, the reward function and more to help future work using reinforcement learning for tractography. We also release the open source codebase, trained models, and datasets for users and researchers wanting to explore reinforcement learning for tractography.