Goubran, Maged
ModalTune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology
Ramanathan, Vishwesh, Xu, Tony, Pati, Pushpak, Ahmed, Faruk, Goubran, Maged, Martel, Anne L.
Prediction tasks in digital pathology are challenging due to the massive size of whole-slide images (WSIs) and the weak nature of training signals. Advances in computing, data availability, and self-supervised learning (SSL) have paved the way for slide-level foundation models (SLFMs) that can improve prediction tasks in low-data regimes. However, working with these models is challenging, with issues such as catastrophic forgetting during fine-tuning and under-utilization of shared information between tasks and modalities. To overcome these two challenges, we propose ModalTune, a novel fine-tuning framework which introduces the Modal Adapter to integrate new modalities without modifying SLFM weights. Additionally, we use large-language models (LLMs) to encode labels as text, capturing semantic relationships and enhancing generalization across multiple tasks and cancer types in a single training recipe. ModalTune achieves state-of-the-art (SOTA) results against both uni-modal and multi-modal models across four cancer types, jointly improving survival and cancer subtype prediction while remaining competitive in pan-cancer settings. Additionally, we show ModalTune is highly generalizable to two out-of-distribution (OOD) datasets. To our knowledge, this is the first unified fine-tuning framework for multi-modal, multi-task, and pan-cancer modeling in digital pathology.
SELMA3D challenge: Self-supervised learning for 3D light-sheet microscopy image segmentation
Chen, Ying, Al-Maskari, Rami, Horvath, Izabela, Ali, Mayar, Hoher, Luciano, Yang, Kaiyuan, Lin, Zengming, Zhai, Zhiwei, Shen, Mengzhe, Xun, Dejin, Wang, Yi, Xu, Tony, Goubran, Maged, Wu, Yunheng, Mori, Kensaku, Paetzold, Johannes C., Erturk, Ali
Recent innovations in light sheet microscopy, paired with developments in tissue clearing techniques, enable the 3D imaging of large mammalian tissues with cellular resolution. Combined with the progress in large-scale data analysis, driven by deep learning, these innovations empower researchers to rapidly investigate the morphological and functional properties of diverse biological samples. Segmentation, a crucial preliminary step in the analysis process, can be automated using domain-specific deep learning models with expert-level performance. However, these models exhibit high sensitivity to domain shifts, leading to a significant drop in accuracy when applied to data outside their training distribution. To address this limitation, and inspired by the recent success of self-supervised learning in training generalizable models, we organized the SELMA3D Challenge during the MICCAI 2024 conference. SELMA3D provides a vast collection of light-sheet images from cleared mice and human brains, comprising 35 large 3D images-each with over 1000^3 voxels-and 315 annotated small patches for finetuning, preliminary testing and final testing. The dataset encompasses diverse biological structures, including vessel-like and spot-like structures. Five teams participated in all phases of the challenge, and their proposed methods are reviewed in this paper. Quantitative and qualitative results from most participating teams demonstrate that self-supervised learning on large datasets improves segmentation model performance and generalization. We will continue to support and extend SELMA3D as an inaugural MICCAI challenge focused on self-supervised learning for 3D microscopy image segmentation.
Mamba-based Deep Learning Approaches for Sleep Staging on a Wireless Multimodal Wearable System without Electroencephalography
Zhang, Andrew H., He-Mo, Alex, Yin, Richard Fei, Li, Chunlin, Tang, Yuzhi, Gurve, Dharmendra, Ghahjaverestan, Nasim Montazeri, Goubran, Maged, Wang, Bo, Lim, Andrew S. P.
Study Objectives: We investigate using Mamba-based deep learning approaches for sleep staging on signals from ANNE One (Sibel Health, Evanston, IL), a minimally intrusive dual-sensor wireless wearable system measuring chest electrocardiography (ECG), triaxial accelerometry, and temperature, as well as finger photoplethysmography (PPG) and temperature. Methods: We obtained wearable sensor recordings from 360 adults undergoing concurrent clinical polysomnography (PSG) at a tertiary care sleep lab. PSG recordings were scored according to AASM criteria. PSG and wearable sensor data were automatically aligned using their ECG channels with manual confirmation by visual inspection. We trained Mamba-based models with both convolutional-recurrent neural network (CRNN) and the recurrent neural network (RNN) architectures on these recordings. Ensembling of model variants with similar architectures was performed. Results: Our best approach, after ensembling, attains a 3-class (wake, NREM, REM) balanced accuracy of 83.50%, F1 score of 84.16%, Cohen's $\kappa$ of 72.68%, and a MCC score of 72.84%; a 4-class (wake, N1/N2, N3, REM) balanced accuracy of 74.64%, F1 score of 74.56%, Cohen's $\kappa$ of 61.63%, and MCC score of 62.04%; a 5-class (wake, N1, N2, N3, REM) balanced accuracy of 64.30%, F1 score of 66.97%, Cohen's $\kappa$ of 53.23%, MCC score of 54.38%. Conclusions: Deep learning models can infer major sleep stages from a wearable system without electroencephalography (EEG) and can be successfully applied to data from adults attending a tertiary care sleep clinic.