Goto

Collaborating Authors

 Pouly, Marc


Towards Scalable Foundation Models for Digital Dermatology

arXiv.org Artificial Intelligence

The growing demand for accurate and equitable AI models in digital dermatology faces a significant challenge: the lack of diverse, high-quality labeled data. In this work, we investigate the potential of domain-specific foundation models for dermatology in addressing this challenge. We utilize self-supervised learning (SSL) techniques to pre-train models on a dataset of over 240,000 dermatological images from public and private collections. Our study considers several SSL methods and compares the resulting foundation models against domain-agnostic models like those pre-trained on ImageNet and state-of-the-art models such as MONET across 12 downstream tasks. Unlike previous research, we emphasize the development of smaller models that are more suitable for resource-limited clinical settings, facilitating easier adaptation to a broad range of use cases. Results show that models pre-trained in this work not only outperform general-purpose models but also approach the performance of models 50 times larger on clinically relevant diagnostic tasks. To promote further research in this direction, we publicly release both the training code and the foundation models, which can benefit clinicians in dermatological applications.


Estimating Text Similarity based on Semantic Concept Embeddings

arXiv.org Artificial Intelligence

Due to their ease of use and high accuracy, Word2Vec (W2V) word embeddings enjoy great success in the semantic representation of words, sentences, and whole documents as well as for semantic similarity estimation. However, they have the shortcoming that they are directly extracted from a surface representation, which does not adequately represent human thought processes and also performs poorly for highly ambiguous words. Therefore, we propose Semantic Concept Embeddings (CE) based on the MultiNet Semantic Network (SN) formalism, which addresses both shortcomings. The evaluation on a marketing target group distribution task showed that the accuracy of predicted target groups can be increased by combining traditional word embeddings with semantic CEs.


Towards Reliable Dermatology Evaluation Benchmarks

arXiv.org Artificial Intelligence

Benchmark datasets for digital dermatology unwittingly contain inaccuracies that reduce trust in model performance estimates. We propose a resource-efficient data-cleaning protocol to identify issues that escaped previous curation. The protocol leverages an existing algorithmic cleaning strategy and is followed by a confirmation process terminated by an intuitive stopping criterion. Based on confirmation by multiple dermatologists, we remove irrelevant samples and near duplicates and estimate the percentage of label errors in six dermatology image datasets for model evaluation promoted by the International Skin Imaging Collaboration. Along with this paper, we publish revised file lists for each dataset which should be used for model evaluation. Our work paves the way for more trustworthy performance assessment in digital dermatology.


Assessing Guest Nationality Composition from Hotel Reviews

arXiv.org Artificial Intelligence

Many hotels target guest acquisition efforts to specific markets in order to best anticipate individual preferences and needs of their guests. Likewise, such strategic positioning is a prerequisite for efficient marketing budget allocation. Official statistics report on the number of visitors from different countries, but no fine-grained information on the guest composition of individual businesses exists. There is, however, growing interest in such data from competitors, suppliers, researchers and the general public. We demonstrate how machine learning can be leveraged to extract references to guest nationalities from unstructured text reviews in order to dynamically assess and monitor the dynamics of guest composition of individual businesses. In particular, we show that a rather simple architecture of pre-trained embeddings and stacked LSTM layers provides a better performance-runtime tradeoff than more complex state-of-the-art language models.


Cable Tree Wiring -- Benchmarking Solvers on a Real-World Scheduling Problem with a Variety of Precedence Constraints

arXiv.org Artificial Intelligence

Cable trees are widely used in industrial products to transmit energy and information between different product parts. For example, cable trees are needed in cars to automate many previously mechanical functions such as moving seats or opening windows and to add new functions such as a voice-controlled navigation or an onboard entertainment system. It is thus not surprising that for example a car like the VW Golf 7 contains 14 cable trees with a total of 1633 cables. The manufacturing of cable trees usually relies on cheap manual labour performed in low-cost countries where humans plug cables into harnesses following a wiring plan. Only few automated manufacturing solutions exist, which rely on complex robotic machines. These machines execute a sequence of wiring operations that highly qualified technicians develop by analyzing the wiring plan. With the continuing tendency towards customer-specific and resource-efficient justin-time manufacturing, smaller batch sizes of cable trees need to be manufactured requiring a frequent change of wiring plans, for which wiring sequences should be derived instantly. Scaling up human expertise to such frequent changes is simply impossible, which explains a growing interest in the intelligent automated manufacturing of cable trees. This interest is also nourished by a further miniaturization of cable harnesses, which will make their manual manufacturing impossible.