Plotting

 Belucci, Bruno


CoHiRF: A Scalable and Interpretable Clustering Framework for High-Dimensional Data

arXiv.org Machine Learning

Clustering high-dimensional data poses significant High-dimensional datasets suffer from the well-known challenges due to the curse of dimensionality, "curse of dimensionality." As the dimensionality p increases, scalability issues, and the presence of noisy and the relevant information often lies in a low-dimensional subspace, irrelevant features. We propose Consensus Hierarchical with the remaining dimensions contributing predominantly Random Feature (CoHiRF), a novel to noise. Consequently, data points tend to become clustering method designed to address these challenges equidistant in high-dimensional space, rendering traditional effectively. CoHiRF leverages random feature distance-based clustering algorithms, such as K-Means, less selection to mitigate noise and dimensionality effective (Beyer et al., 1999). Specifically, the Euclidean distance effects, repeatedly applies K-Means clustering metric loses its discriminative power, resulting in poor in reduced feature spaces, and combines results clustering performance. Another critical challenge is scalability: through a unanimous consensus criterion. This traditional clustering methods, originally designed iterative approach constructs a cluster assignment for low-dimensional or small datasets, often struggle with matrix, where each row records the cluster assignments high computational and memory demands when applied of a sample across repetitions, enabling the to high-dimensional data settings (Steinbach et al., 2004; identification of stable clusters by comparing identical Assent, 2012; Zimek et al., 2012; Mahdi et al., 2021).


Automatic size and pose homogenization with spatial transformer network to improve and accelerate pediatric segmentation

arXiv.org Artificial Intelligence

Due to a high heterogeneity in pose and size and to a limited number of available data, segmentation of pediatric images is challenging for deep learning methods. In this work, we propose a new CNN architecture that is pose and scale invariant thanks to the use of Spatial Transformer Network (STN). Our architecture is composed of three sequential modules that are estimated together during training: (i) a regression module to estimate a similarity matrix to normalize the input image to a reference one; (ii) a differentiable module to find the region of interest to segment; (iii) a segmentation module, based on the popular UNet architecture, to delineate the object. Unlike the original UNet, which strives to learn a complex mapping, including pose and scale variations, from a finite training dataset, our segmentation module learns a simpler mapping focusing on images with normalized pose and size. Furthermore, the use of an automatic bounding box detection through STN allows saving time and especially memory, while keeping similar performance. We test the proposed method in kidney and renal tumor segmentation on abdominal pediatric CT scanners. Results indicate that the estimated STN homogenization of size and pose accelerates the segmentation (25h), compared to standard data-augmentation (33h), while obtaining a similar quality for the kidney (88.01\% of Dice score) and improving the renal tumor delineation (from 85.52\% to 87.12\%).