Improving Joint Learning of Chest X-Ray and Radiology Report by Word Region Alignment

Ji, Zhanghexuan, Shaikh, Mohammad Abuzar, Moukheiber, Dana, Srihari, Sargur, Peng, Yifan, Gao, Mingchen

Sep-4-2021–arXiv.org Artificial Intelligence

Self-supervised learning provides an opportunity to explore unlabeled chest X-rays and their associated free-text reports accumulated in clinical routine without manual supervision. This paper proposes a Joint Image Text Representation Learning Network (JoImTeRNet) for pre-training on chest X-ray images and their radiology reports. The model was pre-trained on both the global image-sentence level and the local image region-word level for visual-textual matching. Both are bidirectionally constrained on Cross-Entropy based and ranking-based Triplet Matching Losses. The region-word matching is calculated using the attention mechanism without direct supervision about their mapping. The pre-trained multi-modal representation learning paves the way for downstream tasks concerning image and/or text encoding. We demonstrate the representation learning quality by cross-modality retrievals and multilabel classifications on two datasets: OpenI-IU and MIMIC-CXR.

deep learning, neural network, representation, (21 more...)

arXiv.org Artificial Intelligence

Sep-4-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States > New York > Erie County > Buffalo (0.14)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (0.46)
    - Natural Language > Text Processing (1.00)
    - Vision (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)