Goto

Collaborating Authors

 Clustering


A Novel Vascular Risk Scoring Framework for Quantifying Sex-Specific Cerebral Perfusion from 3D pCASL MRI

arXiv.org Artificial Intelligence

ABSTRACT The influence of sex and age on cerebral perfusion is recognized, but the specific impacts on regional cerebral blood flow (CBF) and vascular risk remain to be fully characterized. In this study, 3D pseudo-continuous arterial spin labeling (pCASL) MRI was used to identify sex and age related CBF patterns, and a vascular risk score (VRS) was developed based on normative perfusion profiles. Perfusion data from 186 cognitively healthy participants (89 males, 97 females; aged 8 to 92 years), obtained from a publicly available dataset, were analyzed. An extension of the 3D Simple Linear Iterative Clustering (SLIC) supervoxel algorithm was applied to CBF maps to group neighboring voxels with similar intensities into anatomically meaningful regions. Regional CBF features were extracted and used to train a convolutional neural network (CNN) for sex classification and perfusion pattern analysis. Global, age related CBF changes were also assessed. Participant specific VRS was computed by comparing individual CBF profiles to age and sex specific normative data to quantify perfusion deficits. A 95 percent accuracy in sex classification was achieved using the proposed supervoxel based method, and distinct perfusion signatures were identified. Higher CBF was observed in females in medial Brod-mann areas 6 and 10, area V5, occipital polar cortex, and insular regions. A global decline in CBF with age was observed in both sexes. Individual perfusion deficits were quantified using VRS, providing a personalized biomarker for early hy-poperfusion. Sex and age specific CBF patterns were identified, and a personalized vascular risk biomarker was proposed, contributing to advancements in precision neurology. Keywords-- 3D pCASL MRI, CBF, age-and sex-specific perfusion patterns, vascular risk score, cognitively healthy 1. INTRODUCTION Arterial Spin Labeling (ASL) is a non-invasive Magnetic Resonance Imaging (MRI) technique designed to quantitatively assess cerebral blood flow (CBF) by magnetically labeling endogenous arterial blood water protons without the need for exogenous contrast agents or ionizing radiation [1]. The ASL technique involves three key steps: (i) magnetic labeling of arterial blood proximal to the imaging region, (ii) delivery of magnetically tagged blood to brain tissue altering the local MR signal, and (iii) acquisition of paired labeled and control images whose subtraction yields perfusion-weighted maps [1].


ITL-LIME: Instance-Based Transfer Learning for Enhancing Local Explanations in Low-Resource Data Settings

arXiv.org Artificial Intelligence

Explainable Artificial Intelligence (XAI) methods, such as Local Interpretable Model-Agnostic Explanations (LIME), have advanced the interpretability of black-box machine learning models by approximating their behavior locally using interpretable surrogate models. However, LIME's inherent randomness in perturbation and sampling can lead to locality and instability issues, especially in scenarios with limited training data. In such cases, data scarcity can result in the generation of unrealistic variations and samples that deviate from the true data manifold. Consequently, the surrogate model may fail to accurately approximate the complex decision boundary of the original model. To address these challenges, we propose a novel Instance-based Transfer Learning LIME framework (ITL-LIME) that enhances explanation fidelity and stability in data-constrained environments. ITL-LIME introduces instance transfer learning into the LIME framework by leveraging relevant real instances from a related source domain to aid the explanation process in the target domain. Specifically, we employ clustering to partition the source domain into clusters with representative prototypes. Instead of generating random perturbations, our method retrieves pertinent real source instances from the source cluster whose prototype is most similar to the target instance. These are then combined with the target instance's neighboring real instances. To define a compact locality, we further construct a contrastive learning-based encoder as a weighting mechanism to assign weights to the instances from the combined set based on their proximity to the target instance. Finally, these weighted source and target instances are used to train the surrogate model for explanation purposes.


Cequel: Cost-Effective Querying of Large Language Models for Text Clustering

arXiv.org Artificial Intelligence

Text clustering aims to automatically partition a collection of documents into coherent groups based on their linguistic features. In the literature, this task is formulated either as metric clustering over pre-trained text embeddings or as graph clustering based on pairwise similarities derived from an oracle, e.g., a large machine learning model. Recent advances in large language models (LLMs) have significantly improved this field by providing high-quality contextualized embeddings and accurate semantic similarity estimates. However, leveraging LLMs at scale introduces substantial computational and financial costs due to the large number of required API queries or inference calls. To address this issue, we propose Cequel, a cost-effective framework that achieves accurate text clustering under a limited budget of LLM queries. At its core, Cequel constructs must-link and cannot-link constraints by selectively querying LLMs on informative text pairs or triplets, identified via our proposed algorithms, EdgeLLM and TriangleLLM. These constraints are then utilized in a weighted constrained clustering algorithm to form high-quality clusters. Specifically, EdgeLLM and TriangleLLM employ carefully designed greedy selection strategies and prompting techniques to identify and extract informative constraints efficiently. Experiments on multiple benchmark datasets demonstrate that Cequel consistently outperforms existing methods in unsupervised text clustering under the same query budget.


MMiC: Mitigating Modality Incompleteness in Clustered Federated Learning

arXiv.org Artificial Intelligence

In the era of big data, data mining has become indispensable for uncovering hidden patterns and insights from vast and complex datasets. The integration of multimodal data sources further enhances its potential. Multimodal Federated Learning (MFL) is a distributed approach that enhances the efficiency and quality of multimodal learning, ensuring collaborative work and privacy protection. However, missing modalities pose a significant challenge in MFL, often due to data quality issues or privacy policies across the clients. In this work, we present MMiC, a framework for Mitigating Modality incompleteness in MFL within the Clusters. MMiC replaces partial parameters within client models inside clusters to mitigate the impact of missing modalities. Furthermore, it leverages the Banzhaf Power Index to optimize client selection under these conditions. Finally, MMiC employs an innovative approach to dynamically control global aggregation by utilizing Markovitz Portfolio Optimization. Extensive experiments demonstrate that MMiC consistently outperforms existing federated learning architectures in both global and personalized performance on multimodal datasets with missing modalities, confirming the effectiveness of our proposed solution. Our code is available at https://github.com/gotobcn8/MMiC.


Federated Learning based on Self-Evolving Gaussian Clustering

arXiv.org Artificial Intelligence

In this study, we present an Evolving Fuzzy System within the context of Federated Learning, which adapts dynamically with the addition of new clusters and therefore does not require the number of clusters to be selected apriori. Unlike traditional methods, Federated Learning allows models to be trained locally on clients' devices, sharing only the model parameters with a central server instead of the data. Our method, implemented using PyTorch, was tested on clustering and classification tasks. The results show that our approach outperforms established classification methods on several well-known UCI datasets. While computationally intensive due to overlap condition calculations, the proposed method demonstrates significant advantages in decentralized data processing.


Context Steering: A New Paradigm for Compression-based Embeddings by Synthesizing Relevant Information Features

arXiv.org Artificial Intelligence

Compression-based distances (CD) offer a flexible and domain-agnostic means of measuring similarity by identifying implicit information through redundancies between data objects. However, as similarity features are derived from the data, rather than defined as an input, it often proves difficult to align with the task at hand, particularly in complex clustering or classification settings. To address this issue, we introduce "context steering," a novel methodology that actively guides the feature-shaping process. Instead of passively accepting the emergent data structure (typically a hierarchy derived from clustering CDs), our approach "steers" the process by systematically analyzing how each object influences the relational context within a clustering framework. This process generates a custom-tailored embedding that isolates and amplifies class-distinctive information. We validate the capabilities of this strategy using Normalized Compression Distance (NCD) and Relative Compression Distance (NRC) with common hierarchical clustering, providing an effective alternative to common transductive methods. Experimental results across heterogeneous datasets-from text to real-world audio-validate the robustness and generality of context steering, marking a fundamental shift in their application: from merely discovering inherent data structures to actively shaping a feature space tailored to a specific objective.


A Survey on Video Anomaly Detection via Deep Learning: Human, Vehicle, and Environment

arXiv.org Artificial Intelligence

Video Anomaly Detection (VAD) has emerged as a pivotal task in computer vision, with broad relevance across multiple fields. Recent advances in deep learning have driven significant progress in this area, yet the field remains fragmented across domains and learning paradigms. This survey offers a comprehensive perspective on VAD, systematically organizing the literature across various supervision levels, as well as adaptive learning methods such as online, active, and continual learning. We examine the state of VAD across three major application categories: human-centric, vehicle-centric, and environment-centric scenarios, each with distinct challenges and design considerations. In doing so, we identify fundamental contributions and limitations of current methodologies. By consolidating insights from subfields, we aim to provide the community with a structured foundation for advancing both theoretical understanding and real-world applicability of VAD systems. This survey aims to support researchers by providing a useful reference, while also drawing attention to the broader set of open challenges in anomaly detection, including both fundamental research questions and practical obstacles to real-world deployment.




Neural Diffusion Distance for Image Segmentation

Neural Information Processing Systems

The network is a differentiable deep architecture consisting of feature extraction and diffusion distance modules for computing diffusion distance on image by end-to-end training. We design low resolution kernel matching loss and high resolution segment matching loss to enforce the network's output to be consistent with human-labeled image segments.