Goto

Collaborating Authors

 auc roc


Block Expanded DINORET: Adapting Natural Domain Foundation Models for Retinal Imaging Without Catastrophic Forgetting

Zoellin, Jay, Merk, Colin, Buob, Mischa, Saad, Amr, Giesser, Samuel, Spitznagel, Tahm, Turgut, Ferhat, Santos, Rui, Zhou, Yukun, Wagner, Sigfried, Keane, Pearse A., Tham, Yih Chung, DeBuc, Delia Cabrera, Becker, Matthias D., Somfai, Gabor M.

arXiv.org Artificial Intelligence

Integrating deep learning into medical imaging is poised to greatly advance diagnostic methods but it faces challenges with generalizability. Foundation models, based on self-supervised learning, address these issues and improve data efficiency. Natural domain foundation models show promise for medical imaging, but systematic research evaluating domain adaptation, especially using self-supervised learning and parameter-efficient fine-tuning, remains underexplored. Additionally, little research addresses the issue of catastrophic forgetting during fine-tuning of foundation models. We adapted the DINOv2 vision transformer for retinal imaging classification tasks using self-supervised learning and generated two novel foundation models termed DINORET and BE DINORET. Publicly available color fundus photographs were employed for model development and subsequent fine-tuning for diabetic retinopathy staging and glaucoma detection. We introduced block expansion as a novel domain adaptation strategy and assessed the models for catastrophic forgetting. Models were benchmarked to RETFound, a state-of-the-art foundation model in ophthalmology. DINORET and BE DINORET demonstrated competitive performance on retinal imaging tasks, with the block expanded model achieving the highest scores on most datasets. Block expansion successfully mitigated catastrophic forgetting. Our few-shot learning studies indicated that DINORET and BE DINORET outperform RETFound in terms of data-efficiency. This study highlights the potential of adapting natural domain vision models to retinal imaging using self-supervised learning and block expansion. BE DINORET offers robust performance without sacrificing previously acquired capabilities. Our findings suggest that these methods could enable healthcare institutions to develop tailored vision models for their patient populations, enhancing global healthcare inclusivity.


Benchmarking Hierarchical Image Pyramid Transformer for the classification of colon biopsies and polyps in histopathology images

Contreras, Nohemi Sofia Leon, D'Amato, Marina, Ciompi, Francesco, Grisi, Clement, Aswolinskiy, Witali, Vatrano, Simona, Fraggetta, Filippo, Nagtegaal, Iris

arXiv.org Artificial Intelligence

Training neural networks with high-quality pixel-level annotation in histopathology whole-slide images (WSI) is an expensive process due to gigapixel resolution of WSIs. However, recent advances in self-supervised learning have shown that highly descriptive image representations can be learned without the need for annotations. We investigate the application of the recent Hierarchical Image Pyramid Transformer (HIPT) model for the specific task of classification of colorectal biopsies and polyps. After evaluating the effectiveness of TCGA-learned features in the original HIPT model, we incorporate colon biopsy image information into HIPT's pretraining using two distinct strategies: (1) fine-tuning HIPT from the existing TCGA weights and (2) pretraining HIPT from random weight initialization. We compare the performance of these pretraining regimes on two colorectal biopsy classification tasks: binary and multiclass classification.


Classification of Prostate Cancer in 3D Magnetic Resonance Imaging Data based on Convolutional Neural Networks

Rippa, Malte, Schulze, Ruben, Himstedt, Marian, Burn, Felice

arXiv.org Artificial Intelligence

Prostate cancer is a commonly diagnosed cancerous disease among men world-wide. Even with modern technology such as multi-parametric magnetic resonance tomography and guided biopsies, the process for diagnosing prostate cancer remains time consuming and requires highly trained professionals. In this paper, different convolutional neural networks (CNN) are evaluated on their abilities to reliably classify whether an MRI sequence contains malignant lesions. Implementations of a ResNet, a ConvNet and a ConvNeXt for 3D image data are trained and evaluated. The models are trained using different data augmentation techniques, learning rates, and optimizers. The data is taken from a private dataset, provided by Cantonal Hospital Aarau. The best result was achieved by a ResNet3D, yielding an average precision score of 0.4583 and AUC ROC score of 0.6214.


On Fixing the Right Problems in Predictive Analytics: AUC Is Not the Problem

Baker, Ryan S., Bosch, Nigel, Hutt, Stephen, Zambrano, Andres F., Bowers, Alex J.

arXiv.org Artificial Intelligence

Recently, ACM FAccT published an article by Kwegyir-Aggrey and colleagues (2023), critiquing the use of AUC ROC in predictive analytics in several domains. In this article, we offer a critique of that article. Specifically, we highlight technical inaccuracies in that paper's comparison of metrics, mis-specification of the interpretation and goals of AUC ROC, the article's use of the accuracy metric as a gold standard for comparison to AUC ROC, and the article's application of critiques solely to AUC ROC for concerns that would apply to the use of any metric. We conclude with a re-framing of the very valid concerns raised in that article, and discuss how the use of AUC ROC can remain a valid and appropriate practice in a well-informed predictive analytics approach taking those concerns into account. We conclude by discussing the combined use of multiple metrics, including machine learning bias metrics, and AUC ROC's place in such an approach. Like broccoli, AUC ROC is healthy, but also like broccoli, researchers and practitioners in our field shouldn't eat a diet of only AUC ROC.


Inadequacy of common stochastic neural networks for reliable clinical decision support

Lindenmeyer, Adrian, Blattmann, Malte, Franke, Stefan, Neumuth, Thomas, Schneider, Daniel

arXiv.org Artificial Intelligence

Widespread adoption of AI for medical decision making is still hindered due to ethical and safety-related concerns. For AI-based decision support systems in healthcare settings it is paramount to be reliable and trustworthy. Common deep learning approaches, however, have the tendency towards overconfidence under data shift. Such inappropriate extrapolation beyond evidence-based scenarios may have dire consequences. This highlights the importance of reliable estimation of local uncertainty and its communication to the end user. While stochastic neural networks have been heralded as a potential solution to these issues, this study investigates their actual reliability in clinical applications. We centered our analysis on the exemplary use case of mortality prediction for ICU hospitalizations using EHR from MIMIC3 study. For predictions on the EHR time series, Encoder-Only Transformer models were employed. Stochasticity of model functions was achieved by incorporating common methods such as Bayesian neural network layers and model ensembles. Our models achieve state of the art performance in terms of discrimination performance (AUC ROC: 0.868+-0.011, AUC PR: 0.554+-0.034) and calibration on the mortality prediction benchmark. However, epistemic uncertainty is critically underestimated by the selected stochastic deep learning methods. A heuristic proof for the responsible collapse of the posterior distribution is provided. Our findings reveal the inadequacy of commonly used stochastic deep learning approaches to reliably recognize OoD samples. In both methods, unsubstantiated model confidence is not prevented due to strongly biased functional posteriors, rendering them inappropriate for reliable clinical decision support. This highlights the need for approaches with more strictly enforced or inherent distance-awareness to known data points, e.g., using kernel-based techniques.


Temporal Shift -- Multi-Objective Loss Function for Improved Anomaly Fall Detection

Denkovski, Stefan, Khan, Shehroz S., Mihailidis, Alex

arXiv.org Artificial Intelligence

Falls are a major cause of injuries and deaths among older adults worldwide. Accurate fall detection can help reduce potential injuries and additional health complications. Different types of video modalities can be used in a home setting to detect falls, including RGB, Infrared, and Thermal cameras. Anomaly detection frameworks using autoencoders and their variants can be used for fall detection due to the data imbalance that arises from the rarity and diversity of falls. However, the use of reconstruction error in autoencoders can limit the application of networks' structures that propagate information. In this paper, we propose a new multi-objective loss function called Temporal Shift, which aims to predict both future and reconstructed frames within a window of sequential frames. The proposed loss function is evaluated on a semi-naturalistic fall detection dataset containing multiple camera modalities. The autoencoders were trained on normal activities of daily living (ADL) performed by older adults and tested on ADLs and falls performed by young adults. Temporal shift shows significant improvement to a baseline 3D Convolutional autoencoder, an attention U-Net CAE, and a multi-modal neural network. The greatest improvement was observed in an attention U-Net model improving by 0.20 AUC ROC for a single camera when compared to reconstruction alone. With significant improvement across different models, this approach has the potential to be widely adopted and improve anomaly detection capabilities in other settings besides fall detection.


Dealing with zero-inflated data: achieving SOTA with a two-fold machine learning approach

Rožanec, Jože M., Petelin, Gašper, Costa, João, Bertalanič, Blaž, Cerar, Gregor, Guček, Marko, Papa, Gregor, Mladenić, Dunja

arXiv.org Artificial Intelligence

In many cases, a machine learning model must learn to correctly predict a few data points with particular values of interest in a broader range of data where many target values are zero. Zero-inflated data can be found in diverse scenarios, such as lumpy and intermittent demands, power consumption for home appliances being turned on and off, impurities measurement in distillation processes, and even airport shuttle demand prediction. The presence of zeroes affects the models' learning and may result in poor performance. Furthermore, zeroes also distort the metrics used to compute the model's prediction quality. This paper showcases two real-world use cases (home appliances classification and airport shuttle demand prediction) where a hierarchical model applied in the context of zero-inflated data leads to excellent results. In particular, for home appliances classification, the weighted average of Precision, Recall, F1, and AUC ROC was increased by 27%, 34%, 49%, and 27%, respectively. Furthermore, it is estimated that the proposed approach is also four times more energy efficient than the SOTA approach against which it was compared to. Two-fold models performed best in all cases when predicting airport shuttle demand, and the difference against other models has been proven to be statistically significant.


Is it worth it? Comparing six deep and classical methods for unsupervised anomaly detection in time series

Rewicki, Ferdinand, Denzler, Joachim, Niebling, Julia

arXiv.org Artificial Intelligence

The detection of anomalies, or observations that significantly deviate from what is considered normal [1], in time series data is essential in various fields, including healthcare [2], cybersecurity [3, 4], industry [5], and robotics [6]. Anomaly detection is a notoriously challenging task, as the definition of what is considered anomalous can vary based on the context or application [7]. Moreover, the absence of labeled training data for non-academic problems often precludes the use of supervised machine learning techniques. Anomaly detection in data streams, which requires rapid results while aiming to detect anomalies accurately and efficiently, is frequently necessary. It is important to minimize false positive detections to prevent alarm fatigue, which can result in a serious problem being overlooked due to excessive false alarms [7]. It is also necessary to choose the appropriate method based on the application and, often, domain knowledge, as the existence of a universal anomaly detection method is a myth [8]. Choosing the appropriate method from the plethora of available options can be a challenge in itself, as different methods have different strengths in detecting certain types of anomalies. The numerous available methods can be categorized using various criteria, such as the underlying probabilistic, classification, or reconstruction-based model [1], the type of input data (univariate or multivariate), the need for labeled training data, or the ability to process data streams. In this work, we compare six unsupervised anomaly detection methods with varying complexities.


Potential Penetrative Pass (P3)

Sotudeh, Hadi

arXiv.org Artificial Intelligence

To score goals in football, a team needs to move forward on the pitch (Michalczyk, 2020) and there are various ways to do so. Depending on the game plan & philosophy; some teams prefer to play long balls from either wings or defense such as Burnley FC. Others, prefer to penetrate in depth with passes and outplay the opponent players such as Chelsea FC. To objectively & in an automated way evaluate how teams play penetrative passes compared to the number of times they had the potential to do so, I introduce the concept of Potential Penetrative Pass (P3) in this study.


Active Learning for Automated Visual Inspection of Manufactured Products

Trajkova, Elena, Rožanec, Jože M., Dam, Paulien, Fortuna, Blaž, Mladenić, Dunja

arXiv.org Artificial Intelligence

Quality control is a key activity performed by manufacturing enterprises to ensure products meet quality standards and avoid potential damage to the brand's reputation. The decreased cost of sensors and connectivity enabled an increasing digitalization of manufacturing. In addition, artificial intelligence enables higher degrees of automation, reducing overall costs and time required for defect inspection. In this research, we compare three active learning approaches and five machine learning algorithms applied to visual defect inspection with real-world data provided by Philips Consumer Lifestyle BV. Our results show that active learning reduces the data labeling effort without detriment to the models' performance.