Maier, Andreas
A Survey of Incremental Transfer Learning: Combining Peer-to-Peer Federated Learning and Domain Incremental Learning for Multicenter Collaboration
Huang, Yixing, Bert, Christoph, Gomaa, Ahmed, Fietkau, Rainer, Maier, Andreas, Putz, Florian
Due to data privacy constraints, data sharing among multiple clinical centers is restricted, which impedes the development of high performance deep learning models from multicenter collaboration. Naive weight transfer methods share intermediate model weights without raw data and hence can bypass data privacy restrictions. However, performance drops are typically observed when the model is transferred from one center to the next because of the forgetting problem. Incremental transfer learning, which combines peer-to-peer federated learning and domain incremental learning, can overcome the data privacy issue and meanwhile preserve model performance by using continual learning techniques. In this work, a conventional domain/task incremental learning framework is adapted for incremental transfer learning. A comprehensive survey on the efficacy of different regularization-based continual learning methods for multicenter collaboration is performed. The influences of data heterogeneity, classifier head setting, network optimizer, model initialization, center order, and weight transfer type have been investigated thoroughly. Our framework is publicly accessible to the research community for further development.
Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training (TXIT) Exam and Red Journal Gray Zone Cases: Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation Oncology
Huang, Yixing, Gomaa, Ahmed, Semrau, Sabine, Haderlein, Marlen, Lettmaier, Sebastian, Weissmann, Thomas, Grigo, Johanna, Tkhayat, Hassen Ben, Frey, Benjamin, Gaipl, Udo S., Distel, Luitpold V., Maier, Andreas, Fietkau, Rainer, Bert, Christoph, Putz, Florian
The potential of large language models in medicine for education and decision making purposes has been demonstrated as they achieve decent scores on medical exams such as the United States Medical Licensing Exam (USMLE) and the MedQA exam. In this work, we evaluate the performance of ChatGPT-4 in the specialized field of radiation oncology using the 38th American College of Radiology (ACR) radiation oncology in-training (TXIT) exam and the 2022 Red Journal Gray Zone cases. For the TXIT exam, ChatGPT-3.5 and ChatGPT-4 have achieved the scores of 63.65% and 74.57%, respectively, highlighting the advantage of the latest ChatGPT-4 model. Based on the TXIT exam, ChatGPT-4's strong and weak areas in radiation oncology are identified to some extent. Specifically, ChatGPT-4 demonstrates better knowledge of statistics, CNS & eye, pediatrics, biology, and physics than knowledge of bone & soft tissue and gynecology, as per the ACR knowledge domain. Regarding clinical care paths, ChatGPT-4 performs better in diagnosis, prognosis, and toxicity than brachytherapy and dosimetry. It lacks proficiency in in-depth details of clinical trials. For the Gray Zone cases, ChatGPT-4 is able to suggest a personalized treatment approach to each case with high correctness and comprehensiveness. Importantly, it provides novel treatment aspects for many cases, which are not suggested by any human experts. Both evaluations demonstrate the potential of ChatGPT-4 in medical education for the general public and cancer patients, as well as the potential to aid clinical decision-making, while acknowledging its limitations in certain domains. Because of the risk of hallucination, facts provided by ChatGPT always need to be verified.
Federated learning for secure development of AI models for Parkinson's disease detection using speech from different languages
Arasteh, Soroosh Tayebi, Rios-Urrego, Cristian David, Noeth, Elmar, Maier, Andreas, Yang, Seung Hee, Rusz, Jan, Orozco-Arroyave, Juan Rafael
Among automatic PD assessment methods, Recently, deep learning (DL)-based methods have particularly deep learning models have gained particular interest. Recently, gained a lot of attention for analyzing PD speech signals the community has explored cross-pathology and crosslanguage [7, 8]. However, a major impediment to developing such models which can improve diagnostic accuracy even robust DL models is the need for accessing lots of training further. However, strict patient data privacy regulations largely data, which is challenging for many institutions. Thus, benefiting prevent institutions from sharing patient speech data with each from data from different external institutions could solve other. In this paper, we employ federated learning (FL) for PD this issue. However, strict patient data privacy regulations in detection using speech signals from 3 real-world language corpora the medical context make this infeasible in most cases in realworld of German, Spanish, and Czech, each from a separate institution.
Inherently Interpretable Multi-Label Classification Using Class-Specific Counterfactuals
Sun, Susu, Woerner, Stefano, Maier, Andreas, Koch, Lisa M., Baumgartner, Christian F.
Interpretability is essential for machine learning algorithms in high-stakes application fields such as medical image analysis. However, high-performing black-box neural networks do not provide explanations for their predictions, which can lead to mistrust and suboptimal human-ML collaboration. Post-hoc explanation techniques, which are widely used in practice, have been shown to suffer from severe conceptual problems. Furthermore, as we show in this paper, current explanation techniques do not perform adequately in the multi-label scenario, in which multiple medical findings may co-occur in a single image. We propose Attri-Net, an inherently interpretable model for multi-label classification. Attri-Net is a powerful classifier that provides transparent, trustworthy, and human-understandable explanations. The model first generates class-specific attribution maps based on counterfactuals to identify which image regions correspond to certain medical findings. Then a simple logistic regression classifier is used to make predictions based solely on these attribution maps. We compare Attri-Net to five post-hoc explanation techniques and one inherently interpretable classifier on three chest X-ray datasets. We find that Attri-Net produces high-quality multi-label explanations consistent with clinical knowledge and has comparable classification performance to state-of-the-art classification models.
Deep Learning-based Anonymization of Chest Radiographs: A Utility-preserving Measure for Patient Privacy
Packhäuser, Kai, Gündel, Sebastian, Thamm, Florian, Denzinger, Felix, Maier, Andreas
Robust and reliable anonymization of chest radiographs constitutes an essential step before publishing large datasets of such for research purposes. The conventional anonymization process is carried out by obscuring personal information in the images with black boxes and removing or replacing meta-information. However, such simple measures retain biometric information in the chest radiographs, allowing patients to be re-identified by a linkage attack. Therefore, there is an urgent need to obfuscate the biometric information appearing in the images. We propose the first deep learning-based approach (PriCheXy-Net) to targetedly anonymize chest radiographs while maintaining data utility for diagnostic and machine learning purposes. Our model architecture is a composition of three independent neural networks that, when collectively used, allow for learning a deformation field that is able to impede patient re-identification. Quantitative results on the ChestX-ray14 dataset show a reduction of patient re-identification from 81.8% to 57.7% (AUC) after re-training with little impact on the abnormality classification performance. This indicates the ability to preserve underlying abnormality patterns while increasing patient privacy. Lastly, we compare our proposed anonymization approach with two other obfuscation-based methods (Privacy-Net, DP-Pix) and demonstrate the superiority of our method towards resolving the privacy-utility trade-off for chest radiographs.
Heat Demand Forecasting with Multi-Resolutional Representation of Heterogeneous Temporal Ensemble
Ramachandran, Adithya, Chatterjee, Satyaki, Bayer, Siming, Maier, Andreas, Flensmark, Thorkil
One of the primal challenges faced by utility companies is ensuring efficient supply with minimal greenhouse gas emissions. The advent of smart meters and smart grids provide an unprecedented advantage in realizing an optimised supply of thermal energies through proactive techniques such as load forecasting. In this paper, we propose a forecasting framework for heat demand based on neural networks where the time series are encoded as scalograms equipped with the capacity of embedding exogenous variables such as weather, and holiday/non-holiday. Subsequently, CNNs are utilized to predict the heat load multi-step ahead. Finally, the proposed framework is compared with other state-of-the-art methods, such as SARIMAX and LSTM. The quantitative results from retrospective experiments show that the proposed framework consistently outperforms the state-of-the-art baseline method with real-world data acquired from Denmark. A minimal mean error of 7.54% for MAPE and 417kW for RMSE is achieved with the proposed framework in comparison to all other methods.
Conceptual Cognitive Maps Formation with Neural Successor Networks and Word Embeddings
Stoewer, Paul, Schilling, Achim, Maier, Andreas, Krauss, Patrick
The human brain possesses the extraordinary capability to contextualize the information it receives from our environment. The entorhinal-hippocampal plays a critical role in this function, as it is deeply engaged in memory processing and constructing cognitive maps using place and grid cells. Comprehending and leveraging this ability could significantly augment the field of artificial intelligence. The multi-scale successor representation serves as a good model for the functionality of place and grid cells and has already shown promise in this role. Here, we introduce a model that employs successor representations and neural networks, along with word embedding vectors, to construct a cognitive map of three separate concepts. The network adeptly learns two different scaled maps and situates new information in proximity to related pre-existing representations. The dispersion of information across the cognitive map varies according to its scale - either being heavily concentrated, resulting in the formation of the three concepts, or spread evenly throughout the map. We suggest that our model could potentially improve current AI models by providing multi-modal context information to any input, based on a similarity metric for the input and pre-existing knowledge representations.
Handling Label Uncertainty on the Example of Automatic Detection of Shepherd's Crook RCA in Coronary CT Angiography
Denzinger, Felix, Wels, Michael, Taubmann, Oliver, Kordon, Florian, Wagner, Fabian, Mehltretter, Stephanie, Gülsün, Mehmet A., Schöbinger, Max, André, Florian, Buss, Sebastian, Görich, Johannes, Sühling, Michael, Maier, Andreas
Coronary artery disease (CAD) is often treated minimally invasively with a catheter being inserted into the diseased coronary vessel. If a patient exhibits a Shepherd's Crook (SC) Right Coronary Artery (RCA) - an anatomical norm variant of the coronary vasculature - the complexity of this procedure is increased. Automated reporting of this variant from coronary CT angiography screening would ease prior risk assessment. We propose a 1D convolutional neural network which leverages a sequence of residual dilated convolutions to automatically determine this norm variant from a prior extracted vessel centerline. As the SC RCA is not clearly defined with respect to concrete measurements, labeling also includes qualitative aspects. Therefore, 4.23% samples in our dataset of 519 RCA centerlines were labeled as unsure SC RCAs, with 5.97% being labeled as sure SC RCAs. We explore measures to handle this label uncertainty, namely global/model-wise random assignment, exclusion, and soft label assignment. Furthermore, we evaluate how this uncertainty can be leveraged for the determination of a rejection class. With our best configuration, we reach an area under the receiver operating characteristic curve (AUC) of 0.938 on confident labels. Moreover, we observe an increase of up to 0.020 AUC when rejecting 10% of the data and leveraging the labeling uncertainty information in the exclusion process.
DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement
Schröter, Hendrik, Rosenkranz, Tobias, Escalante-B., Alberto N., Maier, Andreas
Multi-frame algorithms for single-channel speech enhancement are able to take advantage from short-time correlations within the speech signal. Deep Filtering (DF) was proposed to directly estimate a complex filter in frequency domain to take advantage of these correlations. In this work, we present a real-time speech enhancement demo using DeepFilterNet. DeepFilterNet's efficiency is enabled by exploiting domain knowledge of speech production and psychoacoustic perception. Our model is able to match state-of-the-art speech enhancement benchmarks while achieving a real-time-factor of 0.19 on a single threaded notebook CPU. The framework as well as pretrained weights have been published under an open source license.
Joint MR sequence optimization beats pure neural network approaches for spin-echo MRI super-resolution
Dang, Hoai Nam, Golkov, Vladimir, Wimmer, Thomas, Cremers, Daniel, Maier, Andreas, Zaiss, Moritz
Current MRI super-resolution (SR) methods only use existing contrasts acquired from typical clinical sequences as input for the neural network (NN). In turbo spin echo sequences (TSE) the sequence parameters can have a strong influence on the actual resolution of the acquired image and have consequently a considera-ble impact on the performance of the NN. We propose a known-operator learning approach to perform an end-to-end optimization of MR sequence and neural net-work parameters for SR-TSE. This MR-physics-informed training procedure jointly optimizes the radiofrequency pulse train of a proton density- (PD-) and T2-weighted TSE and a subsequently applied convolutional neural network to predict the corresponding PDw and T2w super-resolution TSE images. The found radiofrequency pulse train designs generate an optimal signal for the NN to perform the SR task. Our method generalizes from the simulation-based optimi-zation to in vivo measurements and the acquired physics-informed SR images show higher correlation with a time-consuming segmented high-resolution TSE sequence compared to a pure network training approach.