Data-Centric Foundation Models in Computational Healthcare: A Survey

Zhang, Yunkun, Gao, Jin, Tan, Zheling, Zhou, Lingfeng, Ding, Kexin, Zhou, Mu, Zhang, Shaoting, Wang, Dequan

arXiv.org Artificial Intelligence 

In computational healthcare [3, 72], FMs can handle a variety of clinical data with their appealing capabilities in logical reasoning and semantic understanding. Examples span fields in medical conversation [241, 316], patient health profiling [48], and treatment planning [192]. Moreover, given the strength in largescale data processing, FMs offer a shifting paradigm to assess real-world clinical data in the healthcare workflow rapidly and effectively [208, 261]. FM research places a sharp focus on the data-centric perspective [318]. First, FMs demonstrate the power of scale, where the enlarged model and data size permit FMs to capture vast amounts of information, thus increasing the pressing need of training data quantity [272]. Second, FMs encourage homogenization [21] as evidenced by their extensive adaptability to downstream tasks. High-quality data for FM training thus becomes critical since it can impact the performance of both pre-trained FM and downstream models. Therefore, addressing key data challenges is progressively recognized as a research priority.