AITopics

2508.03156

Country: Europe > Germany (1.00)

Genre: Research Report > New Finding (1.00)

Industry: Banking & Finance > Real Estate (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Pham, Ninh, Zheng, Yingtao, Phibbs, Hugo

Scalable Varied-Density Clustering via Graph Propagation

arXiv.org Artificial IntelligenceAug-6-2025

We propose a novel perspective on varied-density clustering for high-dimensional data by framing it as a label propagation process in neighborhood graphs that adapt to local density variations. Our method formally connects density-based clustering with graph connectivity, enabling the use of efficient graph propagation techniques developed in network science. To ensure scalability, we introduce a density-aware neighborhood propagation algorithm and leverage advanced random projection methods to construct approximate neighborhood graphs. Our approach significantly reduces computational cost while preserving clustering quality. Empirically, it scales to datasets with millions of points in minutes and achieves competitive accuracy compared to existing baselines.

artificial intelligence, data mining, machine learning, (19 more...)

2508.02989

Country: Oceania > New Zealand (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceAug-6-2025

Divide-Then-Rule: A Cluster-Driven Hierarchical Interpolator for Attribute-Missing Graphs

Hu, Yaowen, Tu, Wenxuan, Liu, Yue, Li, Miaomiao, Lu, Wenpeng, Luo, Zhigang, Liu, Xinwang, Chen, Ping

Deep graph clustering (DGC) for attribute-missing graphs is an unsupervised task aimed at partitioning nodes with incomplete attributes into distinct clusters. Addressing this challenging issue is vital for practical applications. However, research in this area remains underexplored. Existing imputation methods for attribute-missing graphs often fail to account for the varying amounts of information available across node neighborhoods, leading to unreliable results, especially for nodes with insufficient known neighborhood. To address this issue, we propose a novel method named Divide-Then-Rule Graph Completion (DTRGC). This method first addresses nodes with sufficient known neighborhood information and treats the imputed results as new knowledge to iteratively impute more challenging nodes, while leveraging clustering information to correct imputation errors. Specifically, Dynamic Cluster-Aware Feature Propagation (DCFP) initializes missing node attributes by adjusting propagation weights based on the clustering structure. Subsequently, Hierarchical Neighborhood-aware Imputation (HNAI) categorizes attribute-missing nodes into three groups based on the completeness of their neighborhood attributes. The imputation is performed hierarchically, prioritizing the groups with nodes that have the most available neighborhood information. The cluster structure is then used to refine the imputation and correct potential errors. Finally, Hop-wise Representation Enhancement (HRE) integrates information across multiple hops, thereby enriching the expressiveness of node representations. Experimental results on six widely used graph datasets show that DTRGC significantly improves the clustering performance of various DGC methods under attribute-missing graphs.

data mining, machine learning, node, (15 more...)

2507.10595

Country:

Asia > China (0.69)
North America > United States (0.46)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Allem, Luiz Emilio, Avrachenkov, Konstantin, Hoppen, Carlos, Manjunath, Hariprasad, Sibemberg, Lucas Siviero

Multi-Community Spectral Clustering for Geometric Graphs

arXiv.org Machine LearningAug-5-2025

In this paper, we consider the soft geometric block model (SGBM) with a fixed number $k \geq 2$ of homogeneous communities in the dense regime, and we introduce a spectral clustering algorithm for community recovery on graphs generated by this model. Given such a graph, the algorithm produces an embedding into $\mathbb{R}^{k-1}$ using the eigenvectors associated with the $k-1$ eigenvalues of the adjacency matrix of the graph that are closest to a value determined by the parameters of the model. It then applies $k$-means clustering to the embedding. We prove weak consistency and show that a simple local refinement step ensures strong consistency. A key ingredient is an application of a non-standard version of Davis-Kahan theorem to control eigenspace perturbations when eigenvalues are not simple. We also analyze the limiting spectrum of the adjacency matrix, using a combination of combinatorial and matrix techniques.

artificial intelligence, machine learning, matrix, (16 more...)

arXiv.org Machine Learning

2508.00893

Country:

South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Washington > King County > Bellevue (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Dynamic Clustering for Personalized Federated Learning on Heterogeneous Edge Devices

Liu, Heting, Huang, Junzhe, He, Fang, Cao, Guohong

--Federated Learning (FL) enables edge devices to collaboratively learn a global model, but it may not perform well when clients have high data heterogeneity. In this paper, we propose a dynamic clustering algorithm for personalized federated learning system ( DC-PFL) to address the problem of data heterogeneity. DC-PFL starts with all clients training a global model and gradually groups the clients into smaller clusters for model personalization based on their data similarities. T o address the challenge of estimating data heterogeneity without exposing raw data, we introduce a discrepancy metric called model discrepancy, which approximates data heterogeneity solely based on the model weights received by the server . We demonstrate that model discrepancy is strongly and positively correlated with data heterogeneity and can serve as a reliable indicator of data heterogeneity. T o determine when and how to change grouping structures, we propose an algorithm based on the rapid decrease period of the training loss curve. Moreover, we propose a layer-wise aggregation mechanism that aggregates the low-discrepancy layers at a lower frequency to reduce the amount of transmitted data and communication costs. We conduct extensive experiments on various datasets to evaluate our proposed algorithm, and our results show that DC-PFL significantly reduces total training time and improves model accuracy compared to baselines. In recent years, there has been significant growth in intelligent applications based on deep neural networks, driven by the availability of large amounts of data collected from the Internet. However, the increasing concern about privacy and trust risks associated with this kind of data collection cannot be ignored [1].

artificial intelligence, data heterogeneity, machine learning, (18 more...)

2508.0158

Genre: Research Report > New Finding (0.86)

Industry: Information Technology > Security & Privacy (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Nagori, Aditya, Gautam, Ayush, Wiens, Matthew O., Nguyen, Vuong, Mugisha, Nathan Kenya, Kabakyenga, Jerome, Kissoon, Niranjan, Ansermino, John Mark, Kamaleswaran, Rishikesan

Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models

Clustering patient subgroups is essential for personalized care and efficient resource use. Traditional clustering methods struggle with high-dimensional, heterogeneous healthcare data and lack contextual understanding. This study evaluates Large Language Model (LLM) based clustering against classical methods using a pediatric sepsis dataset from a low-income country (LIC), containing 2,686 records with 28 numerical and 119 categorical variables. Patient records were serialized into text with and without a clustering objective. Embeddings were generated using quantized LLAMA 3.1 8B, DeepSeek-R1-Distill-Llama-8B with low-rank adaptation(LoRA), and Stella-En-400M-V5 models. K-means clustering was applied to these embeddings. Classical comparisons included K-Medoids clustering on UMAP and FAMD-reduced mixed data. Silhouette scores and statistical tests evaluated cluster quality and distinctiveness. Stella-En-400M-V5 achieved the highest Silhouette Score (0.86). LLAMA 3.1 8B with the clustering objective performed better with higher number of clusters, identifying subgroups with distinct nutritional, clinical, and socioeconomic profiles. LLM-based methods outperformed classical techniques by capturing richer context and prioritizing key features. These results highlight potential of LLMs for contextual phenotyping and informed decision-making in resource-limited settings.

large language model, machine learning, natural language, (20 more...)

2505.09805

Country:

North America > United States (0.46)
Africa > Uganda (0.29)
North America > Canada > British Columbia (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Mahmud, Raj, Berkovsky, Shlomo, Prasad, Mukesh, Kocaballi, A. Baki

Understanding User Preferences for Interaction Styles in Conversational Recommender Systems: The Predictive Role of System Qualities, User Experience, and Traits

Conversational Recommender Systems (CRSs) deliver personalised recommendations through multi-turn natural language dialogue and increasingly support both task-oriented and exploratory interactions. Yet, the factors shaping user interaction preferences remain underexplored. In this within-subjects study ($N = 139$), participants experienced two scripted CRS dialogues, rated their experiences, and indicated the importance of eight system qualities. Logistic regression revealed that preference for the exploratory interaction was predicted by enjoyment, usefulness, novelty, and conversational quality. Unexpectedly, perceived effectiveness was also associated with exploratory preference. Clustering uncovered five latent user profiles with distinct dialogue style preferences. Moderation analyses indicated that age, gender, and control preference significantly influenced these choices. These findings integrate affective, cognitive, and trait-level predictors into CRS user modelling and inform autonomy-sensitive, value-adaptive dialogue design. The proposed predictive and adaptive framework applies broadly to conversational AI systems seeking to align dynamically with evolving user needs.

artificial intelligence, machine learning, natural language, (18 more...)

2508.02328

Country:

North America > United States > California (0.28)
Oceania > Australia > New South Wales > Sydney (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (0.68)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
(2 more...)

A Group Consensus-Driven Auction Algorithm for Cooperative Task Allocation Among Heterogeneous Multi-Agents

Wang, Gang, Han, Hongfang, Liu, Xiaowei, Jiang, Hanfeng, Zhang, Ming

In scenarios like automated warehouses, assigning tasks to robots presents a heterogeneous multi-task and multi-agent task allocation problem. However, existing task allocation study ignores the integration of multi-task and multi-attribute agent task allocation with heterogeneous task allocation. In addition, current algorithms are limited by scenario constraints and can incur significant errors in specific contexts. Therefore, this study proposes a distributed heterogeneous multi-task and multi-agent task allocation algorithm with a time window, called group consensus-based heterogeneous auction (GCBHA). Firstly, this method decomposes tasks that exceed the capability of a single Agent into subtasks that can be completed by multiple independent agents. And then groups similar or adjacent tasks through a heuristic clustering method to reduce the time required to reach a consensus. Subsequently, the task groups are allocated to agents that meet the conditions through an auction process. Furthermore, the method evaluates the task path cost distance based on the scenario, which can calculate the task cost more accurately. The experimental results demonstrate that GCBHA performs well in terms of task allocation time and solution quality, with a significant reduction in the error rate between predicted task costs and actual costs.

agent, artificial intelligence, machine learning, (16 more...)

2508.02015

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.48)

Industry: Transportation (0.69)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

arXiv.org Artificial IntelligenceAug-4-2025

From EMR Data to Clinical Insight: An LLM-Driven Framework for Automated Pre-Consultation Questionnaire Generation

Ding, Ruiqing, Sun, Qianfang, Leng, Yongkang, Yin, Hui, Li, Xiaojian

Pre-consultation is a critical component of effective healthcare delivery. However, generating comprehensive pre-consultation questionnaires from complex, voluminous Electronic Medical Records (EMRs) is a challenging task. Direct Large Language Model (LLM) approaches face difficulties in this task, particularly regarding information completeness, logical order, and disease-level synthesis. To address this issue, we propose a novel multi-stage LLM-driven framework: Stage 1 extracts atomic assertions (key facts with timing) from EMRs; Stage 2 constructs personal causal networks and synthesizes disease knowledge by clustering representative networks from an EMR corpus; Stage 3 generates tailored personal and standardized disease-specific questionnaires based on these structured representations. This framework overcomes limitations of direct methods by building explicit clinical knowledge. Evaluated on a real-world EMR dataset and validated by clinical experts, our method demonstrates superior performance in information coverage, diagnostic relevance, understandability, and generation time, highlighting its practical potential to enhance patient information collection.

large language model, machine learning, natural language, (19 more...)

2508.00581

Country: Asia > China (0.28)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (0.93)

Industry:

Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

arXiv.org Machine LearningAug-1-2025

AdaptHetero: Machine Learning Interpretation-Driven Subgroup Adaptation for EHR-Based Clinical Prediction

Liao, Ling, Aagaard, Eva

However, the in t rinsic complexity and heterogeneity of EHR data limit its effectiveness in guiding subgroup - specific modelin g . W e propose AdaptHetero, a novel MLI - driven framework that transforms interpretability insights into actionable guidance for tailor ing model training and evaluation across subpopulations within individual hospital systems . E valuated on th ree large - scale EH R datasets -- GOSSIS - 1 - eICU, WiDS, and MIMIC - IV -- AdaptHetero consistently identif ies heterogeneous model behaviors in predicting ICU mortality, in - hospital death, and hidden hypoxemia. By integrating SHAP - based interpretation and unsupervised clustering, the framework enhances the identification of clinicall y meaningful subgroup - specific characteristics, leading to improved predictive performance and optimized clinical deployment . Introduction Machine learning interpretation (MLI) techniques are increasingly leveraged in the analysis of electronic health records (EHRs) to reveal latent clinical patterns and to support trustworthy, actionable decision - making in high - stakes healthcare settings .

artificial intelligence, machine learning, subgroup, (14 more...)

arXiv.org Machine Learning

2507.21197

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > Middle East > Israel (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.55)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)