Goto

Collaborating Authors

 Regensburg



Provable Recovery of Locally Important Signed Features and Interactions from Random Forest

Vuk, Kata, Ihlo, Nicolas Alexander, Behr, Merle

arXiv.org Machine Learning

Feature and Interaction Importance (FII) methods are essential in supervised learning for assessing the relevance of input variables and their interactions in complex prediction models. In many domains, such as personalized medicine, local interpretations for individual predictions are often required, rather than global scores summarizing overall feature importance. Random Forests (RFs) are widely used in these settings, and existing interpretability methods typically exploit tree structures and split statistics to provide model-specific insights. However, theoretical understanding of local FII methods for RF remains limited, making it unclear how to interpret high importance scores for individual predictions. We propose a novel, local, model-specific FII method that identifies frequent co-occurrences of features along decision paths, combining global patterns with those observed on paths specific to a given test point. We prove that our method consistently recovers the true local signal features and their interactions under a Locally Spike Sparse (LSS) model and also identifies whether large or small feature values drive a prediction. We illustrate the usefulness of our method and theoretical results through simulation studies and a real-world data example.


The MICCAI Federated Tumor Segmentation (FeTS) Challenge 2024: Efficient and Robust Aggregation Methods for Federated Learning

Linardos, Akis, Pati, Sarthak, Baid, Ujjwal, Edwards, Brandon, Foley, Patrick, Ta, Kevin, Chung, Verena, Sheller, Micah, Khan, Muhammad Irfan, Jafaritadi, Mojtaba, Kontio, Elina, Khan, Suleiman, Mächler, Leon, Ezhov, Ivan, Shit, Suprosanna, Paetzold, Johannes C., Grimberg, Gustav, Nickel, Manuel A., Naccache, David, Siomos, Vasilis, Passerat-Palmbach, Jonathan, Tarroni, Giacomo, Kim, Daewoon, Klausmann, Leonard L., Shah, Prashant, Menze, Bjoern, Makris, Dimitrios, Bakas, Spyridon

arXiv.org Artificial Intelligence

We present the design and results of the MICCAI Federated Tumor Segmentation (FeTS) Challenge 2024, which focuses on federated learning (FL) for glioma sub-region segmentation in multi-parametric MRI and evaluates new weight aggregation methods aimed at improving robustness and efficiency. Six participating teams were evaluated using a standardized FL setup and a multi-institutional dataset derived from the BraTS glioma benchmark, consisting of 1,251 training cases, 219 validation cases, and 570 hidden test cases with segmentations for enhancing tumor (ET), tumor core (TC), and whole tumor (WT). Teams were ranked using a cumulative scoring system that considered both segmentation performance, measured by Dice Similarity Coefficient (DSC) and the 95th percentile Hausdorff Distance (HD95), and communication efficiency assessed through the convergence score. A PID-controller-based method achieved the top overall ranking, obtaining mean DSC values of 0.733, 0.761, and 0.751 for ET, TC, and WT, respectively, with corresponding HD95 values of 33.922 mm, 33.623 mm, and 32.309 mm, while also demonstrating the highest communication efficiency with a convergence score of 0.764. These findings advance the state of federated learning for medical imaging, surpassing top-performing methods from previous challenge iterations and highlighting PID controllers as effective mechanisms for stabilizing and optimizing weight aggregation in FL. The challenge code is available at https://github.com/FeTS-AI/Challenge.


5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf

Neural Information Processing Systems

Constrained clustering has gained significant attention in the field of machine learning as it can leverage prior information on a growing amount of only partially labeled data. Following recent advances in deep generative models, we propose a novel framework for constrained clustering that is intuitive, interpretable, and can be trained efficiently in the framework of stochastic gradient variational inference. By explicitly integrating domain knowledge in the form of probabilistic relations, our proposed model (DC-GMM) uncovers the underlying distribution of data conditioned on prior clustering preferences, expressed as pairwise constraints. These constraints guide the clustering process towards a desirable partition of the data by indicating which samples should or should not belong to the same cluster. We provide extensive experiments to demonstrate that DC-GMM shows superior clustering performances and robustness compared to state-of-the-art deep constrained clustering methods on a wide range of data sets. We further demonstrate the usefulness of our approach on two challenging real-world applications.




Appendix Table of Contents

Neural Information Processing Systems

The naive aggregation of these public datasets results in a database with partial and incomplete labels, e.g., LiTS only had labels for the liver and its tumors, and KiTS only had labels for the kidneys and its tumors. Conversely, our AbdomenAtlas 1.0 is fully-annotated, offering detailed per-voxel labels Figure 3: Anatomical boundaries and structures can be indistinct due to disease, as seen in the JHH dataset. We display CT volumes with patients depicted under unhealthy conditions that are challenging for most AI algorithms to identify. The CT volumes are from patients in unhealthy conditions. The encoder performs down-sampling operations, and it is designed to capture high-level semantics and context information.




Boomda: Balanced Multi-objective Optimization for Multimodal Domain Adaptation

Sun, Jun, Zhang, Xinxin, Hong, Simin, Zhu, Jian, Gao, Xiang

arXiv.org Artificial Intelligence

Multimodal learning, while contributing to numerous success stories across various fields, faces the challenge of prohibitively expensive manual annotation. To address the scarcity of annotated data, a popular solution is unsupervised domain adaptation, which has been extensively studied in unimodal settings yet remains less explored in multi-modal settings. In this paper, we investigate heterogeneous multimodal domain adaptation, where the primary challenge is the varying domain shifts of different modalities from the source to the target domain. We first introduce the information bottleneck method to learn representations for each modality independently, and then match the source and target domains in the representation space with correlation alignment. To balance the domain alignment of all modalities, we formulate the problem as a multi-objective task, aiming for a Pareto optimal solution. By exploiting the properties specific to our model, the problem can be simplified to a quadratic programming problem. Further approximation yields a closed-form solution, leading to an efficient modality-balanced multimodal domain adaptation algorithm. The proposed method features Balanced multi-objective optimization for multimodal domain adaptation, termed Boomda. Extensive empirical results showcase the effectiveness of the proposed approach and demonstrate that Boomda outperforms the competing schemes.