Goto

Collaborating Authors

 Ahmadi, Seyed-Ahmad


MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision

arXiv.org Artificial Intelligence

Prior to the deep learning era, shape was commonly used to describe the objects. Nowadays, state-of-the-art (SOTA) algorithms in medical imaging are predominantly diverging from computer vision, where voxel grids, meshes, point clouds, and implicit surface models are used. This is seen from numerous shape-related publications in premier vision conferences as well as the growing popularity of ShapeNet (about 51,300 models) and Princeton ModelNet (127,915 models). For the medical domain, we present a large collection of anatomical shapes (e.g., bones, organs, vessels) and 3D models of surgical instrument, called MedShapeNet, created to facilitate the translation of data-driven vision algorithms to medical applications and to adapt SOTA vision algorithms to medical problems. As a unique feature, we directly model the majority of shapes on the imaging data of real patients. As of today, MedShapeNet includes 23 dataset with more than 100,000 shapes that are paired with annotations (ground truth). Our data is freely accessible via a web interface and a Python application programming interface (API) and can be used for discriminative, reconstructive, and variational benchmarks as well as various applications in virtual, augmented, or mixed reality, and 3D printing. Exemplary, we present use cases in the fields of classification of brain tumors, facial and skull reconstructions, multi-class anatomy completion, education, and 3D printing. In future, we will extend the data and improve the interfaces. The project pages are: https://medshapenet.ikim.nrw/ and https://github.com/Jianningli/medshapenet-feedback


DIAMANT: Dual Image-Attention Map Encoders For Medical Image Segmentation

arXiv.org Artificial Intelligence

Although purely transformer-based architectures showed promising performance in many computer vision tasks, many hybrid models consisting of CNN and transformer blocks are introduced to fit more specialized tasks. Nevertheless, despite the performance gain of both pure and hybrid transformer-based architectures compared to CNNs in medical imaging segmentation, their high training cost and complexity make it challenging to use them in real scenarios. In this work, we propose simple architectures based on purely convolutional layers, and show that by just taking advantage of the attention map visualizations obtained from a self-supervised pretrained vision transformer network (e.g., DINO) one can outperform complex transformer-based networks with much less computation costs. The proposed architecture is composed of two encoder branches with the original image as input in one branch and the attention map visualizations of the same image from multiple self-attention heads from a pre-trained DINO model (as multiple channels) in the other branch. The results of our experiments on two publicly available medical imaging datasets show that the proposed pipeline outperforms U-Net and the state-of-the-art medical image segmentation models.


Simultaneous imputation and disease classification in incomplete medical datasets using Multigraph Geometric Matrix Completion (MGMC)

arXiv.org Machine Learning

Large-scale population-based studies in medicine are a key resource towards better diagnosis, monitoring, and treatment of diseases. They also serve as enablers of clinical decision support systems, in particular Computer Aided Diagnosis (CADx) using machine learning (ML). Numerous ML approaches for CADx have been proposed in literature. However, these approaches assume full data availability, which is not always feasible in clinical data. To account for missing data, incomplete data samples are either removed or imputed, which could lead to data bias and may negatively affect classification performance. As a solution, we propose an end-to-end learning of imputation and disease prediction of incomplete medical datasets via Multigraph Geometric Matrix Completion (MGMC). MGMC uses multiple recurrent graph convolutional networks, where each graph represents an independent population model based on a key clinical meta-feature like age, sex, or cognitive function. Graph signal aggregation from local patient neighborhoods, combined with multigraph signal fusion via self-attention, has a regularizing effect on both matrix reconstruction and classification performance. Our proposed approach is able to impute class relevant features as well as perform accurate classification on two publicly available medical datasets. We empirically show the superiority of our proposed approach in terms of classification and imputation performance when compared with state-of-the-art approaches. MGMC enables disease prediction in multimodal and incomplete medical datasets. These findings could serve as baseline for future CADx approaches which utilize incomplete datasets.


Domain-specific loss design for unsupervised physical training: A new approach to modeling medical ML solutions

arXiv.org Machine Learning

Today, cataract surgery is the most frequently performed ophthalmic surgery in the world. The cataract, a developing opacity of the human eye lens, constitutes the world's most frequent cause for blindness. During surgery, the lens is removed and replaced by an artificial intraocular lens (IOL). To prevent patients from needing strong visual aids after surgery, a precise prediction of the optical properties of the inserted IOL is crucial. There has been lots of activity towards developing methods to predict these properties from biometric eye data obtained by OCT devices, recently also by employing machine learning. They consider either only biometric data or physical models, but rarely both, and often neglect the IOL geometry. In this work, we propose OpticNet, a novel optical refraction network, loss function, and training scheme which is unsupervised, domain-specific, and physically motivated. We derive a precise light propagation eye model using single-ray raytracing and formulate a differentiable loss function that back-propagates physical gradients into the network. Further, we propose a new transfer learning procedure, which allows unsupervised training on the physical model and fine-tuning of the network on a cohort of real IOL patient cases. We show that our network is not only superior to systems trained with standard procedures but also that our method outperforms the current state of the art in IOL calculation when compared on two biometric data sets.


Decision Support for Intoxication Prediction Using Graph Convolutional Networks

arXiv.org Artificial Intelligence

Every day, poison control centers (PCC) are called for immediate classification and treatment recommendations if an acute intoxication is suspected. Due to the time-sensitive nature of these cases, doctors are required to propose a correct diagnosis and intervention within a minimal time frame. Usually the toxin is known and recommendations can be made accordingly. However, in challenging cases only symptoms are mentioned and doctors have to rely on their clinical experience. Medical experts and our analyses of a regional dataset of intoxication records provide evidence that this is challenging, since occurring symptoms may not always match the textbook description due to regional distinctions, inter-rater variance, and institutional workflow. Computer-aided diagnosis (CADx) can provide decision support, but approaches so far do not consider additional information of the reported cases like age or gender, despite their potential value towards a correct diagnosis. In this work, we propose a new machine learning based CADx method which fuses symptoms and meta information of the patients using graph convolutional networks. We further propose a novel symptom matching method that allows the effective incorporation of prior knowledge into the learning process and evidently stabilizes the poison prediction. We validate our method against 10 medical doctors with different experience diagnosing intoxication cases for 10 different toxins from the PCC in Munich and show our method's superiority in performance for poison prediction.


Multi-modal Graph Fusion for Inductive Disease Classification in Incomplete Datasets

arXiv.org Machine Learning

Clinical diagnostic decision making and population-based studies often rely on multi-modal data which is noisy and incomplete. Recently, several works proposed geometric deep learning approaches to solve disease classification, by modeling patients as nodes in a graph, along with graph signal processing of multi-modal features. Many of these approaches are limited by assuming modality- and feature-completeness, and by transductive inference, which requires re-training of the entire model for each new test sample. In this work, we propose a novel inductive graph-based approach that can generalize to out-of-sample patients, despite missing features from entire modalities per patient. We propose multi-modal graph fusion which is trained end-to-end towards node-level classification. We demonstrate the fundamental working principle of this method on a simplified MNIST toy dataset. In experiments on medical data, our method outperforms single static graph approach in multi-modal disease classification.


Adaptive image-feature learning for disease classification using inductive graph networks

arXiv.org Machine Learning

Recently, Geometric Deep Learning (GDL) has been introduced as a novel and versatile framework for computer-aided disease classification. GDL uses patient meta-information such as age and gender to model patient cohort relations in a graph structure. Concepts from graph signal processing are leveraged to learn the optimal mapping of multi-modal features, e.g. from images to disease classes. Related studies so far have considered image features that are extracted in a pre-processing step. We hypothesize that such an approach prevents the network from optimizing feature representations towards achieving the best performance in the graph network. We propose a new network architecture that exploits an inductive end-to-end learning approach for disease classification, where filters from both the CNN and the graph are trained jointly. We validate this architecture against state-of-the-art inductive graph networks and demonstrate significantly improved classification scores on a modified MNIST toy dataset, as well as comparable classification results with higher stability on a chest X-ray image dataset. Additionally, we explain how the structural information of the graph affects both the image filters and the feature learning.


InceptionGCN: Receptive Field Aware Graph Convolutional Network for Disease Prediction

arXiv.org Machine Learning

Geometric deep learning provides a principled and versatile manner for the integration of imaging and non-imaging modalities in the medical domain. Graph Convolutional Networks (GCNs) in particular have been explored on a wide variety of problems such as disease prediction, segmentation, and matrix completion by leveraging large, multimodal datasets. In this paper, we introduce a new spectral domain architecture for deep learning on graphs for disease prediction. The novelty lies in defining geometric 'inception modules' which are capable of capturing intra- and inter-graph structural heterogeneity during convolutions. We design filters with different kernel sizes to build our architecture. We show our disease prediction results on two publicly available datasets. Further, we provide insights on the behaviour of regular GCNs and our proposed model under varying input scenarios on simulated data.


Stabilizing Inputs to Approximated Nonlinear Functions for Inference with Homomorphic Encryption in Deep Neural Networks

arXiv.org Machine Learning

Abstract--Leveled Homomorphic Encryption (LHE) offers a potential solution that could allow sectors with sensitive data to utilize the cloud and securely deploy their models for remote inference with Deep Neural Networks (DNN). However, this application faces several obstacles due to the limitations of LHE. One of the main problems is the incompatibility of commonly used nonlinear functions in DNN with the operations supported by LHE, i.e. addition and multiplication. As common in LHE approaches, we train a model with a nonlinear function, and replace it with a low-degree polynomial approximation at inference time on private data. While this typically leads to approximation errors and loss in prediction accuracy, we propose a method that reduces this loss to small values or eliminates it entirely, depending on simple hyper-parameters. This is achieved by the introduction of a novel and elegantly simple Min-Max normalization scheme, which scales inputs to nonlinear functions into ranges with low approximation error. While being intuitive in its concept and trivial to implement, we empirically show that it offers a stable and effective approximation solution to nonlinear functions in DNN. In return, this can enable deeper networks with LHE, and facilitate the development of securityand privacy-awareanalytics applications.


Classification of sparsely labeled spatio-temporal data through semi-supervised adversarial learning

arXiv.org Machine Learning

In recent years, Generative Adversarial Networks (GAN) have emerged as a powerful method for learning the mapping from noisy latent spaces to realistic data samples in high-dimensional space. So far, the development and application of GANs have been predominantly focused on spatial data such as images. In this project, we aim at modeling of spatio-temporal sensor data instead, i.e. dynamic data over time. The main goal is to encode temporal data into a global and low-dimensional latent vector that captures the dynamics of the spatio-temporal signal. To this end, we incorporate auto-regressive RNNs, Wasserstein GAN loss, spectral norm weight constraints and a semi-supervised learning scheme into InfoGAN, a method for retrieval of meaningful latents in adversarial learning. To demonstrate the modeling capability of our method, we encode full-body skeletal human motion from a large dataset representing 60 classes of daily activities, recorded in a multi-Kinect setup. Initial results indicate competitive classification performance of the learned latent representations, compared to direct CNN/RNN inference. In future work, we plan to apply this method on a related problem in the medical domain, i.e. on recovery of meaningful latents in gait analysis of patients with vertigo and balance disorders.