Tinn, Robert
Exploring the Boundaries of GPT-4 in Radiology
Liu, Qianchu, Hyland, Stephanie, Bannur, Shruthi, Bouzid, Kenza, Castro, Daniel C., Wetscherek, Maria Teodora, Tinn, Robert, Sharma, Harshita, Pérez-García, Fernando, Schwaighofer, Anton, Rajpurkar, Pranav, Khanna, Sameer Tajdin, Poon, Hoifung, Usuyama, Naoto, Thieme, Anja, Nori, Aditya V., Lungren, Matthew P., Oktay, Ozan, Alvarez-Valle, Javier
The recent success of general-domain large language models (LLMs) has significantly changed the natural language processing paradigm towards a unified foundation model across domains and applications. In this paper, we focus on assessing the performance of GPT-4, the most capable LLM so far, on the text-based applications for radiology reports, comparing against state-of-the-art (SOTA) radiology-specific models. Exploring various prompting strategies, we evaluated GPT-4 on a diverse range of common radiology tasks and we found GPT-4 either outperforms or is on par with current SOTA radiology models. With zero-shot prompting, GPT-4 already obtains substantial gains ($\approx$ 10% absolute improvement) over radiology models in temporal sentence similarity classification (accuracy) and natural language inference ($F_1$). For tasks that require learning dataset-specific style or schema (e.g. findings summarisation), GPT-4 improves with example-based prompting and matches supervised SOTA. Our extensive error analysis with a board-certified radiologist shows GPT-4 has a sufficient level of radiology knowledge with only occasional errors in complex context that require nuanced domain knowledge. For findings summarisation, GPT-4 outputs are found to be overall comparable with existing manually-written impressions.
Large-Scale Domain-Specific Pretraining for Biomedical Vision-Language Processing
Zhang, Sheng, Xu, Yanbo, Usuyama, Naoto, Bagga, Jaspreet, Tinn, Robert, Preston, Sam, Rao, Rajesh, Wei, Mu, Valluri, Naveen, Wong, Cliff, Lungren, Matthew P., Naumann, Tristan, Poon, Hoifung
Contrastive pretraining on parallel image-text data has attained great success in vision-language processing (VLP), as exemplified by CLIP and related methods. However, prior explorations tend to focus on general domains in the web. Biomedical images and text are rather different, but publicly available datasets are small and skew toward chest X-ray, thus severely limiting progress. In this paper, we conducted by far the largest study on biomedical VLP, using 15 million figure-caption pairs extracted from biomedical research articles in PubMed Central. Our dataset (PMC-15M) is two orders of magnitude larger than existing biomedical image-text datasets such as MIMIC-CXR, and spans a diverse range of biomedical images. The standard CLIP method is suboptimal for the biomedical domain. We propose BiomedCLIP with domain-specific adaptations tailored to biomedical VLP. We conducted extensive experiments and ablation studies on standard biomedical imaging tasks from retrieval to classification to visual question-answering (VQA). BiomedCLIP established new state of the art in a wide range of standard datasets, substantially outperformed prior VLP approaches. Surprisingly, BiomedCLIP even outperformed radiology-specific state-of-the-art models such as BioViL on radiology-specific tasks such as RSNA pneumonia detection, thus highlighting the utility in large-scale pretraining across all biomedical image types. We will release our models at https://aka.ms/biomedclip to facilitate future research in biomedical VLP.