Maier, Andreas
Analysing Diffusion Segmentation for Medical Images
Öttl, Mathias, Mei, Siyuan, Wilm, Frauke, Steenpass, Jana, Rübner, Matthias, Hartmann, Arndt, Beckmann, Matthias, Fasching, Peter, Maier, Andreas, Erber, Ramona, Breininger, Katharina
Denoising Diffusion Probabilistic models have become increasingly popular due to their ability to offer probabilistic modeling and generate diverse outputs. This versatility inspired their adaptation for image segmentation, where multiple predictions of the model can produce segmentation results that not only achieve high quality but also capture the uncertainty inherent in the model. Here, powerful architectures were proposed for improving diffusion segmentation performance. However, there is a notable lack of analysis and discussions on the differences between diffusion segmentation and image generation, and thorough evaluations are missing that distinguish the improvements these architectures provide for segmentation in general from their benefit for diffusion segmentation specifically. In this work, we critically analyse and discuss how diffusion segmentation for medical images differs from diffusion image generation, with a particular focus on the training behavior. Furthermore, we conduct an assessment how proposed diffusion segmentation architectures perform when trained directly for segmentation. Lastly, we explore how different medical segmentation tasks influence the diffusion segmentation behavior and the diffusion process could be adapted accordingly. With these analyses, we aim to provide in-depth insights into the behavior of diffusion segmentation that allow for a better design and evaluation of diffusion segmentation methods in the future.
Integer Optimization of CT Trajectories using a Discrete Data Completeness Formulation
Schneider, Linda-Sophie, Herl, Gabriel, Maier, Andreas
X-ray computed tomography (CT) plays a key role in digitizing three-dimensional structures for a wide range of medical and industrial applications. Traditional CT systems often rely on standard circular and helical scan trajectories, which may not be optimal for challenging scenarios involving large objects, complex structures, or resource constraints. In response to these challenges, we are exploring the potential of twin robotic CT systems, which offer the flexibility to acquire projections from arbitrary views around the object of interest. Ensuring complete and mathematically sound reconstructions becomes critical in such systems. In this work, we present an integer programming-based CT trajectory optimization method. Utilizing discrete data completeness conditions, we formulate an optimization problem to select an optimized set of projections. This approach enforces data completeness and considers absorption-based metrics for reliability evaluation. We compare our method with an equidistant circular CT trajectory and a greedy approach. While greedy already performs well in some cases, we provide a way to improve greedy-based projection selection using an integer optimization approach. Our approach improves CT trajectories and quantifies the optimality of the solution in terms of an optimality gap.
Data-Driven Filter Design in FBP: Transforming CT Reconstruction with Trainable Fourier Series
Sun, Yipeng, Schneider, Linda-Sophie, Fan, Fuxin, Thies, Mareike, Gu, Mingxuan, Mei, Siyuan, Zhou, Yuzhong, Bayer, Siming, Maier, Andreas
In this study, we introduce a Fourier series-based trainable filter for computed tomography (CT) reconstruction within the filtered backprojection (FBP) framework. This method overcomes the limitation in noise reduction, inherent in conventional FBP methods, by optimizing Fourier series coefficients to construct the filter. This method enables robust performance across different resolution scales and maintains computational efficiency with minimal increment for the trainable parameters compared to other deep learning frameworks. Additionally, we propose Gaussian edge-enhanced (GEE) loss function that prioritizes the $L_1$ norm of high-frequency magnitudes, effectively countering the blurring problems prevalent in mean squared error (MSE) approaches. The model's foundation in the FBP algorithm ensures excellent interpretability, as it relies on a data-driven filter with all other parameters derived through rigorous mathematical procedures. Designed as a plug-and-play solution, our Fourier series-based filter can be easily integrated into existing CT reconstruction models, making it a versatile tool for a wide range of practical applications. Our research presents a robust and scalable method that expands the utility of FBP in both medical and scientific imaging.
Attention-Guided Erasing: A Novel Augmentation Method for Enhancing Downstream Breast Density Classification
Panambur, Adarsh Bhandary, Yu, Hui, Bhat, Sheethal, Madhu, Prathmesh, Bayer, Siming, Maier, Andreas
The assessment of breast density is crucial in the context of breast cancer screening, especially in populations with a higher percentage of dense breast tissues. This study introduces a novel data augmentation technique termed Attention-Guided Erasing (AGE), devised to enhance the downstream classification of four distinct breast density categories in mammography following the BI-RADS recommendation in the Vietnamese cohort. The proposed method integrates supplementary information during transfer learning, utilizing visual attention maps derived from a vision transformer backbone trained using the self-supervised DINO method. These maps are utilized to erase background regions in the mammogram images, unveiling only the potential areas of dense breast tissues to the network. Through the incorporation of AGE during transfer learning with varying random probabilities, we consistently surpass classification performance compared to scenarios without AGE and the traditional random erasing transformation. We validate our methodology using the publicly available VinDr-Mammo dataset. Specifically, we attain a mean F1-score of 0.5910, outperforming values of 0.5594 and 0.5691 corresponding to scenarios without AGE and with random erasing (RE), respectively. This superiority is further substantiated by t-tests, revealing a p-value of p<0.0001, underscoring the statistical significance of our approach.
Building a Non-native Speech Corpus Featuring Chinese-English Bilingual Children: Compilation and Rationale
Hung, Hiuchung, Maier, Andreas, Piske, Thorsten
This paper introduces a non-native speech corpus consisting of narratives from fifty 5- to 6-year-old Chinese-English children. Transcripts totaling 6.5 hours of children taking a narrative comprehension test in English (L2) are presented, along with human-rated scores and annotations of grammatical and pronunciation errors. The children also completed the parallel MAIN tests in Chinese (L1) for reference purposes. For all tests we recorded audio and video with our innovative self-developed remote collection methods. The video recordings serve to mitigate the challenge of low intelligibility in L2 narratives produced by young children during the transcription process. This corpus offers valuable resources for second language teaching and has the potential to enhance the overall performance of automatic speech recognition (ASR).
SYNTA: A novel approach for deep learning-based image analysis in muscle histopathology using photo-realistic synthetic data
Mill, Leonid, Aust, Oliver, Ackermann, Jochen A., Burger, Philipp, Pascual, Monica, Palumbo-Zerr, Katrin, Krönke, Gerhard, Uderhardt, Stefan, Schett, Georg, Clemen, Christoph S., Schröder, Rolf, Holtzhausen, Christian, Jabari, Samir, Maier, Andreas, Grüneboom, Anika
Artificial intelligence (AI), machine learning, and deep learning (DL) methods are becoming increasingly important in the field of biomedical image analysis. However, to exploit the full potential of such methods, a representative number of experimentally acquired images containing a significant number of manually annotated objects is needed as training data. Here we introduce SYNTA (synthetic data) as a novel approach for the generation of synthetic, photo-realistic, and highly complex biomedical images as training data for DL systems. We show the versatility of our approach in the context of muscle fiber and connective tissue analysis in histological sections. We demonstrate that it is possible to perform robust and expert-level segmentation tasks on previously unseen real-world data, without the need for manual annotations using synthetic training data alone. Being a fully parametric technique, our approach poses an interpretable and controllable alternative to Generative Adversarial Networks (GANs) and has the potential to significantly accelerate quantitative image analysis in a variety of biomedical applications in microscopy and beyond.
Comparative Analysis of Radiomic Features and Gene Expression Profiles in Histopathology Data Using Graph Neural Networks
Monroy, Luis Carlos Rivera, Rist, Leonhard, Eberhardt, Martin, Ostalecki, Christian, Bauer, Andreas, Vera, Julio, Breininger, Katharina, Maier, Andreas
This study leverages graph neural networks to integrate MELC data with Radiomic-extracted features for melanoma classification, focusing on cell-wise analysis. It assesses the effectiveness of gene expression profiles and Radiomic features, revealing that Radiomic features, particularly when combined with UMAP for dimensionality reduction, significantly enhance classification performance. Notably, using Radiomics contributes to increased diagnostic accuracy and computational efficiency, as it allows for the extraction of critical data from fewer stains, thereby reducing operational costs. This methodology marks an advancement in computational dermatology for melanoma cell classification, setting the stage for future research and potential developments.
Multi-Modal Cognitive Maps based on Neural Networks trained on Successor Representations
Stoewer, Paul, Schilling, Achim, Maier, Andreas, Krauss, Patrick
Cognitive maps are a proposed concept on how the brain efficiently organizes memories and retrieves context out of them. The entorhinal-hippocampal complex is heavily involved in episodic and relational memory processing, as well as spatial navigation and is thought to built cognitive maps via place and grid cells. To make use of the promising properties of cognitive maps, we set up a multi-modal neural network using successor representations which is able to model place cell dynamics and cognitive map representations. Here, we use multi-modal inputs consisting of images and word embeddings. The network learns the similarities between novel inputs and the training database and therefore the representation of the cognitive map successfully. Subsequently, the prediction of the network can be used to infer from one modality to another with over $90\%$ accuracy. The proposed method could therefore be a building block to improve current AI systems for better understanding of the environment and the different modalities in which objects appear. The association of specific modalities with certain encounters can therefore lead to context awareness in novel situations when similar encounters with less information occur and additional information can be inferred from the learned cognitive map. Cognitive maps, as represented by the entorhinal-hippocampal complex in the brain, organize and retrieve context from memories, suggesting that large language models (LLMs) like ChatGPT could harness similar architectures to function as a high-level processing center, akin to how the hippocampus operates within the cortex hierarchy. Finally, by utilizing multi-modal inputs, LLMs can potentially bridge the gap between different forms of data (like images and words), paving the way for context-awareness and grounding of abstract concepts through learned associations, addressing the grounding problem in AI.
OCTDL: Optical Coherence Tomography Dataset for Image-Based Deep Learning Methods
Kulyabin, Mikhail, Zhdanov, Aleksei, Nikiforova, Anastasia, Stepichev, Andrey, Kuznetsova, Anna, Ronkin, Mikhail, Borisov, Vasilii, Bogachev, Alexander, Korotkich, Sergey, Constable, Paul A, Maier, Andreas
Optical coherence tomography (OCT) is a non-invasive imaging technique with extensive clinical applications in ophthalmology. OCT enables the visualization of the retinal layers, playing a vital role in the early detection and monitoring of retinal diseases. OCT uses the principle of light wave interference to create detailed images of the retinal microstructures, making it a valuable tool for diagnosing ocular conditions. This work presents an open-access OCT dataset (OCTDL) comprising over 1600 high-resolution OCT images labeled according to disease group and retinal pathology. The dataset consists of OCT records of patients with Age-related Macular Degeneration (AMD), Diabetic Macular Edema (DME), Epiretinal Membrane (ERM), Retinal Artery Occlusion (RAO), Retinal Vein Occlusion (RVO), and Vitreomacular Interface Disease (VID). The images were acquired with an Optovue Avanti RTVue XR using raster scanning protocols with dynamic scan length and image resolution. Each retinal b-scan was acquired by centering on the fovea and interpreted and cataloged by an experienced retinal specialist. In this work, we applied Deep Learning classification techniques to this new open-access dataset.
The effect of speech pathology on automatic speaker verification -- a large-scale study
Arasteh, Soroosh Tayebi, Weise, Tobias, Schuster, Maria, Noeth, Elmar, Maier, Andreas, Yang, Seung Hee
Navigating the challenges of data-driven speech processing, one of the primary hurdles is accessing reliable pathological speech data. While public datasets appear to offer solutions, they come with inherent risks of potential unintended exposure of patient health information via re-identification attacks. Using a comprehensive real-world pathological speech corpus, with over n=3,800 test subjects spanning various age groups and speech disorders, we employed a deep-learning-driven automatic speaker verification (ASV) approach. This resulted in a notable mean equal error rate (EER) of 0.89% with a standard deviation of 0.06%, outstripping traditional benchmarks. Our comprehensive assessments demonstrate that pathological speech overall faces heightened privacy breach risks compared to healthy speech. Specifically, adults with dysphonia are at heightened re-identification risks, whereas conditions like dysarthria yield results comparable to those of healthy speakers. Crucially, speech intelligibility does not influence the ASV system's performance metrics. In pediatric cases, particularly those with cleft lip and palate, the recording environment plays a decisive role in re-identification. Merging data across pathological types led to a marked EER decrease, suggesting the potential benefits of pathological diversity in ASV, accompanied by a logarithmic boost in ASV effectiveness. In essence, this research sheds light on the dynamics between pathological speech and speaker verification, emphasizing its crucial role in safeguarding patient confidentiality in our increasingly digitized healthcare era.