A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs
McConnell, Niccolò, Vasudev, Pardeep, Yamada, Daisuke, Cheng, Daryl, Azimbagirad, Mehran, McCabe, John, Aslani, Shahab, Shahin, Ahmed H., Zhou, Yukun, Consortium, The SUMMIT, Altmann, Andre, Hu, Yipeng, Taylor, Paul, Janes, Sam M., Alexander, Daniel C., Jacob, Joseph
–arXiv.org Artificial Intelligence
Summit Consortium a uthors and affiliations listed at end of file. Low - dose computed tomography (LDCT) imaging employed in lung cancer screening (LCS) programs is increasing in uptake worldwide. LCS programs herald a generational opportunity to simultaneously detect canc er and non - cancer - related early - stage lung disease. Yet these efforts are hampered by a shortage of radiologists to interpret scans at scale. Designed for broad accessibility and rapid adaptation, TANGERINE can be fine - tuned off the shelf for a wide range of disease - specific tasks with limited computational resources and training data. Relative to models trained from scratch, TANGERINE dem onstrates fast convergence during fine - tuning, thereby requiring significantly fewer GPU hours, and displays strong label efficiency, achieving comparable or superior performance with a fraction of fine - tuning data. Pretrained using self - supervised learni ng on over 98,000 thoracic LDCTs, including the UK ' s largest LCS initiative to date and 27 public datasets, TANGERINE achieves strong performance across 14 disease classification tasks, including lung cancer and multiple respiratory diseases, while general ising robustly across diverse clinical centres. By extending a masked autoencoder framework to 3D imaging, TANGERINE offers a scalable solution for LDCT analysis, departing from recent closed, resource - intensive models by combining architectural simplicity, public availability, and modest computational requirements. Its accessible, open - source lightweight design lays the foundation for rapid integration into next - generation medical imaging tools that could transform LCS initiatives, allowing them to pivot f rom a singular focus on lung cancer detection to comprehensive respiratory disease management in high - risk populations. National lung cancer screening (LCS) programs herald a generational opportunity to identify early pre - symptomatic disease phenotypes for some of the most common chronic respiratory diseases in the world. In contrast, LCS programmes afford the opportunity to detect preclinical stages of airways or interstitial lung damage, where imaging abnormalities are radiologically visible despite lung function tests remaining normal. Moreover, methods have often relied on patch - based approaches that risk losing contextual information and require prior knowledge of disease location for model development. These limit ations constrain the utility of such models in research and clinical environments, where computational resources are often limited. Hence, t here remains a pressing need for foundation models that are not only accurate and generalisable, but also lightweigh t, open - access, and computationally efficient - enabling fine - tuning with limited data and resources.
arXiv.org Artificial Intelligence
Jul-17-2025
- Country:
- North America > United States (0.14)
- Europe
- Russia (0.04)
- Norway (0.04)
- Netherlands (0.04)
- United Kingdom > England
- Greater London > London (0.05)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Italy > Umbria
- Perugia Province > Perugia (0.04)
- Asia
- Russia (0.04)
- China (0.04)
- Middle East
- Iran (0.04)
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Research Report
- Industry:
- Technology: