Tan, Zheling
MedForge: Building Medical Foundation Models Like Open Source Software Development
Tan, Zheling, Ding, Kexin, Gao, Jin, Zhou, Mu, Metaxas, Dimitris, Zhang, Shaoting, Wang, Dequan
Foundational models (FMs) have made significant strides in the healthcare domain. Yet the data silo challenge and privacy concern remain in healthcare systems, hindering safe medical data sharing and collaborative model development among institutions. The collection and curation of scalable clinical datasets increasingly become the bottleneck for training strong FMs. In this study, we propose Medical Foundation Models Merging (MedForge), a cooperative framework enabling a community-driven medical foundation model development, meanwhile preventing the information leakage of raw patient data and mitigating synchronization model development issues across clinical institutions. MedForge offers a bottom-up model construction mechanism by flexibly merging task-specific Low-Rank Adaptation (LoRA) modules, which can adapt to downstream tasks while retaining original model parameters. Through an asynchronous LoRA module integration scheme, the resulting composite model can progressively enhance its comprehensive performance on various clinical tasks. MedForge shows strong performance on multiple clinical datasets (e.g., breast cancer, lung cancer, and colon cancer) collected from different institutions. Our major findings highlight the value of collaborative foundation models in advancing multi-center clinical collaboration effectively and cohesively. Our code is publicly available at https://github.com/TanZheling/MedForge.
Data-Centric Foundation Models in Computational Healthcare: A Survey
Zhang, Yunkun, Gao, Jin, Tan, Zheling, Zhou, Lingfeng, Ding, Kexin, Zhou, Mu, Zhang, Shaoting, Wang, Dequan
In computational healthcare [3, 72], FMs can handle a variety of clinical data with their appealing capabilities in logical reasoning and semantic understanding. Examples span fields in medical conversation [241, 316], patient health profiling [48], and treatment planning [192]. Moreover, given the strength in largescale data processing, FMs offer a shifting paradigm to assess real-world clinical data in the healthcare workflow rapidly and effectively [208, 261]. FM research places a sharp focus on the data-centric perspective [318]. First, FMs demonstrate the power of scale, where the enlarged model and data size permit FMs to capture vast amounts of information, thus increasing the pressing need of training data quantity [272]. Second, FMs encourage homogenization [21] as evidenced by their extensive adaptability to downstream tasks. High-quality data for FM training thus becomes critical since it can impact the performance of both pre-trained FM and downstream models. Therefore, addressing key data challenges is progressively recognized as a research priority.