Transformers in Healthcare: A Survey

Nerella, Subhash, Bandyopadhyay, Sabyasachi, Zhang, Jiaqing, Contreras, Miguel, Siegel, Scott, Bumin, Aysegul, Silva, Brandon, Sena, Jessica, Shickel, Benjamin, Bihorac, Azra, Khezeli, Kia, Rashidi, Parisa

arXiv.org Artificial Intelligence 

In contrast, transformers employ a "Scaled Dot-Product Attention" mechanism that is parallelizable. This unique attention mechanism allows for large-scale pretraining. Additionally, self-supervised pretraining paradigm such as masked language modeling onlarge unlabeled datasets enabled transformers to be trained without costly annotations. Transformer model, although originally designed for the NLP [3] domain, Transformers have witnessed adaptations in various domains such as computer vision [5, 6], remote sensing [7], time series [8], speech processing [9] and multimodal learning [10]. Consequently, modality specific surveys emerged, focusing on medical imaging [11-13] and biomedical language models [14] in the medical domain. This paper aims to provide comprehensive overview of Transformer models utilized across multiple modalities of data to address healthcare objectives. We discuss pre-training strategies to manage the lack of robust and annotated healthcare datasets. The rest of the paper is organized as follows: Section 2 discusses the strategy to search for relevant citations; Section 3 describes the architecture of the original transformer; Section 4 describes the two primary Transformer variants: the Bidirectional Encoder Representations from Transformers (BERT) and the Vision Transformer (ViT). Section 5 describes advancements in large language models (LLM), and section 6 through 12 provides a review of Transformers in healthcare.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found