LT-ViT: A Vision Transformer for multi-label Chest X-ray classification

Marikkar, Umar, Atito, Sara, Awais, Muhammad, Mahdi, Adam

Nov-13-2023–arXiv.org Artificial Intelligence

Vision Transformers (ViTs) are widely adopted in medical imaging tasks, and some existing efforts have been directed towards vision-language training for Chest X-rays (CXRs). However, we envision that there still exists a potential for improvement in vision-only training for CXRs using ViTs, by aggregating information from multiple scales, which has been proven beneficial for non-transformer networks. Hence, we have developed LT-ViT, a transformer that utilizes combined attention between image tokens and randomly initialized auxiliary tokens that represent labels. Our experiments demonstrate that LT-ViT (1) surpasses the state-of-the-art performance using pure ViTs on two publicly available CXR datasets, (2) is generalizable to other pre-training methods and therefore is agnostic to model initialization, and (3) enables model interpretability without grad-cam and its variants.

artificial intelligence, label token, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Nov-13-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Industry:
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.69)
  - Vision (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found