Goto

Collaborating Authors

 Zhao, Tian


Medical Multimodal Foundation Models in Clinical Diagnosis and Treatment: Applications, Challenges, and Future Directions

arXiv.org Artificial Intelligence

Recent advancements in deep learning have significantly revolutionized the field of clinical diagnosis and treatment, offering novel approaches to improve diagnostic precision and treatment efficacy across diverse clinical domains, thus driving the pursuit of precision medicine. The growing availability of multi-organ and multimodal datasets has accelerated the development of large-scale Medical Multimodal Foundation Models (MMFMs). These models, known for their strong generalization capabilities and rich representational power, are increasingly being adapted to address a wide range of clinical tasks, from early diagnosis to personalized treatment strategies. This review offers a comprehensive analysis of recent developments in MMFMs, focusing on three key aspects: datasets, model architectures, and clinical applications. We also explore the challenges and opportunities in optimizing multimodal representations and discuss how these advancements are shaping the future of healthcare by enabling improved patient outcomes and more efficient clinical workflows.


GCtx-UNet: Efficient Network for Medical Image Segmentation

arXiv.org Artificial Intelligence

Automated medical image segmentation is critical in providing valuable information for the prevention, diagnosis, progression monitoring, and prognosis of various diseases, as well as quantitative pathology assessment. The U-shaped deep-neural networks, which include encoders, decoders, and skip connections, are now the most widely used methods for medical image segmentation. Although the U-shaped networks have achieved state-of-the-art performance in numerous medical image segmentation tasks, it still has limitations. One primary limitation is the encoders' ability to effectively extract and integrate long-range and local features. Methods based on Convolutional Neural Networks (CNNs) such as UNet [26] and UNet++ [35] excel at capturing local features, but they struggle to model long-range dependencies within data. While Transformer-based methods such as Swin-UNet [6] can model long-range pixel relations, they lack spatial induction bias in modeling local information, which leads to unsatisfactory results. Past research explored CNN-Transformer hybrid architectures such as TransUnet [8] to capture global and local information but these models often significantly increase the number of parameters.


Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark

arXiv.org Machine Learning

The deep learning community has proposed optimizations spanning hardware, software, and learning theory to improve the computational performance of deep learning workloads. While some of these optimizations perform the same operations faster (e.g., switching from a NVIDIA K80 to P100), many modify the semantics of the training procedure (e.g., large minibatch training, reduced precision), which can impact a model's generalization ability. Due to a lack of standard evaluation criteria that considers these trade-offs, it has become increasingly difficult to compare these different advances. To address this shortcoming, DAWNBENCH and the upcoming MLPERF benchmarks use time-to-accuracy as the primary metric for evaluation, with the accuracy threshold set close to state-of-the-art and measured on a held-out dataset not used in training; the goal is to train to this accuracy threshold as fast as possible. In DAWNBENCH , the winning entries improved time-to-accuracy on ImageNet by two orders of magnitude over the seed entries. Despite this progress, it is unclear how sensitive time-to-accuracy is to the chosen threshold as well as the variance between independent training runs, and how well models optimized for time-to-accuracy generalize. In this paper, we provide evidence to suggest that time-to-accuracy has a low coefficient of variance and that the models tuned for it generalize nearly as well as pre-trained models. We additionally analyze the winning entries to understand the source of these speedups, and give recommendations for future benchmarking efforts.