Gonzalez-Jimenez, Alvaro
Towards Scalable Foundation Models for Digital Dermatology
Gröger, Fabian, Gottfrois, Philippe, Amruthalingam, Ludovic, Gonzalez-Jimenez, Alvaro, Lionetti, Simone, Soenksen-Martinez, Luis R., Navarini, Alexander A., Pouly, Marc
The growing demand for accurate and equitable AI models in digital dermatology faces a significant challenge: the lack of diverse, high-quality labeled data. In this work, we investigate the potential of domain-specific foundation models for dermatology in addressing this challenge. We utilize self-supervised learning (SSL) techniques to pre-train models on a dataset of over 240,000 dermatological images from public and private collections. Our study considers several SSL methods and compares the resulting foundation models against domain-agnostic models like those pre-trained on ImageNet and state-of-the-art models such as MONET across 12 downstream tasks. Unlike previous research, we emphasize the development of smaller models that are more suitable for resource-limited clinical settings, facilitating easier adaptation to a broad range of use cases. Results show that models pre-trained in this work not only outperform general-purpose models but also approach the performance of models 50 times larger on clinically relevant diagnostic tasks. To promote further research in this direction, we publicly release both the training code and the foundation models, which can benefit clinicians in dermatological applications.
Towards Reliable Dermatology Evaluation Benchmarks
Gröger, Fabian, Lionetti, Simone, Gottfrois, Philippe, Gonzalez-Jimenez, Alvaro, Groh, Matthew, Daneshjou, Roxana, Consortium, Labelling, Navarini, Alexander A., Pouly, Marc
Benchmark datasets for digital dermatology unwittingly contain inaccuracies that reduce trust in model performance estimates. We propose a resource-efficient data-cleaning protocol to identify issues that escaped previous curation. The protocol leverages an existing algorithmic cleaning strategy and is followed by a confirmation process terminated by an intuitive stopping criterion. Based on confirmation by multiple dermatologists, we remove irrelevant samples and near duplicates and estimate the percentage of label errors in six dermatology image datasets for model evaluation promoted by the International Skin Imaging Collaboration. Along with this paper, we publish revised file lists for each dataset which should be used for model evaluation. Our work paves the way for more trustworthy performance assessment in digital dermatology.