chexzero
TIER: Text-Image Entropy Regularization for CLIP-style models
In this paper, we introduce a novel regularization scheme on contrastive language-image pre-trained (CLIP) medical vision models. Our approach is based on the observation that on many medical imaging tasks text tokens should only describe a small number of image regions and, likewise, each image region should correspond to only a few text tokens. In CLIP-style models, this implies that text-token embeddings should have high similarity to only a small number of image-patch embeddings for a given image-text pair. We formalize this observation using a novel regularization scheme that penalizes the entropy of the text-token to image-patch similarity scores. We qualitatively and quantitatively demonstrate that the proposed regularization scheme shrinks most of the pairwise text-token and image-patch similarity scores towards zero, thus achieving the desired effect. We demonstrate the promise of our approach in an important medical context, chest x-rays, where this underlying sparsity hypothesis naturally arises. Using our proposed approach, we achieve state of the art (SOTA) average zero-shot performance on the CheXpert and Padchest chest x-ray datasets, outperforming an unregularized version of the model and several recently published self-supervised models.
CheXzero: Detect Pathologies From Unannotated X-ray Images
This article was published as a part of the Data Science Blogathon. Working on a task involving the interpretation of chest X-ray medical images and no labeled data at your disposal? Researchers from Harvard Medical School and Stanford University have devised an artificial intelligence diagnostic tool that can detect diseases from natural language descriptions of chest X-rays without needing the labeled data. This is a major step toward significant advancement in clinical AI design because most existing models require vast amounts of annotated data before that data can be fed into a model for training. This research paper will look at the proposed method in further detail.
No labels? No problem!
Harvard Medical School scientists and colleagues at Stanford University have developed an artificial intelligence diagnostic tool that can detect diseases on chest X-rays directly from natural-language descriptions contained in accompanying clinical reports. The step is deemed a major advance in clinical AI design because most current AI models require laborious human annotation of vast reams of data before the labeled data are fed into the model to train it. A report on the work, published Sept. 15 in Nature Biomedical Engineering, shows that the model, called CheXzero, performed on par with human radiologists in its ability to detect pathologies on chest X-rays. The team has made the code for the model publicly available for other researchers. Most AI models require labeled datasets during their "training" so they can learn to correctly identify pathologies. This process is especially burdensome for medical image-interpretation tasks since it involves large-scale annotation by human clinicians, which is often expensive and time-consuming.