AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

STAMICS: Splat, Track And Map with Integrated Consistency and Semantics for Dense RGB-D SLAM

Wang, Yongxu, Cao, Xu, Yi, Weiyun, Fan, Zhaoxin

arXiv.org Artificial IntelligenceMar-27-2025

-- Simultaneous Localization and Mapping (SLAM) is a critical task in robotics, enabling systems to autonomously navigate and understand complex environments. Current SLAM approaches predominantly rely on geometric cues for mapping and localization, but they often fail to ensure semantic consistency, particularly in dynamic or densely populated scenes. T o address this limitation, we introduce ST AMICS, a novel method that integrates semantic information with 3D Gaussian representations to enhance both localization and mapping accuracy. ST AMICS consists of three key components: a 3D Gaussian-based scene representation for high-fidelity reconstruction, a graph-based clustering technique that enforces temporal semantic consistency, and an open-vocabulary system that allows for the classification of unseen objects. Extensive experiments show that ST AMICS significantly improves camera pose estimation and map quality, outperforming state-of-the-art methods while reducing reconstruction errors. Code will be public available. I. INTRODUCTION Simultaneous Localization and Mapping (SLAM) is a crucial technology in fields such as autonomous driving, robotics, and augmented reality, enabling systems to perceive, map, and navigate complex environments in real time [11], [32]. The ability to accurately localize and construct detailed maps is fundamental for autonomous systems to interact safely and effectively with their surroundings.

consistency, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.21425

Country:

Asia > China > Beijing > Beijing (0.05)
Europe > Italy > Lazio > Rome (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report > Promising Solution (0.86)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

A Methodology to extract Geo-Referenced Standard Routes from AIS Data

Corvino, Michela, Daffinà, Filippo, Francalanci, Chiara, Giacomazzi, Paolo, Magliani, Martina, Ravanelli, Paolo, Stahl, Torbjorn

arXiv.org Artificial IntelligenceMar-26-2025

Maritime AIS (Automatic Identification Systems) data serve as a valuable resource for studying vessel behavior. This study proposes a methodology to analyze route between maritime points of interest and extract geo-referenced standard routes, as maritime patterns of life, from raw AIS data. The underlying assumption is that ships adhere to consistent patterns when travelling in certain maritime areas due to geographical, environmental, or economic factors. Deviations from these patterns may be attributed to weather conditions, seasonality, or illicit activities. This enables maritime surveillance authorities to analyze the navigational behavior between ports, providing insights on vessel route patterns, possibly categorized by vessel characteristics (type, flag, or size). Our methodological process begins by segmenting AIS data into distinct routes using a finite state machine (FSM), which describes routes as seg-ments connecting pairs of points of interest. The extracted segments are ag-gregated based on their departure and destination ports and then modelled using iterative density-based clustering to connect these ports. The cluster-ing parameters are assigned manually to sample and then extended to the en-tire dataset using linear regression. Overall, the approach proposed in this paper is unsupervised and does not require any ground truth to be trained. The approach has been tested on data on the on a six-year AIS dataset cover-ing the Arctic region and the Europe, Middle East, North Africa areas. The total size of our dataset is 1.15 Tbytes. The approach has proved effective in extracting standard routes, with less than 5% outliers, mostly due to routes with either their departure or their destination port not included in the test areas.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.22734

Country:

Africa > North Africa (0.25)
Europe > Sweden > Örebro County > Örebro (0.04)
Europe > Italy > Lombardy > Milan (0.04)
(5 more...)

Genre: Research Report (0.64)

Industry:

Government > Military (0.66)
Transportation > Marine (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Geographical hotspot prediction based on point cloud-voxel-community partition clustering

Tang, Yan

arXiv.org Artificial IntelligenceMar-26-2025

Existing solutions to the hotspot prediction problem in the field of geographic information remain at a relatively preliminary stage. This study presents a novel approach for detecting and predicting geographical hotspots, utilizing point cloud-voxel-community partition clustering. By analyzing high-dimensional data, we represent spatial information through point clouds, which are then subdivided into multiple voxels to enhance analytical efficiency. Our method identifies spatial voxels with similar characteristics through community partitioning, thereby revealing underlying patterns in hotspot distributions. Experimental results indicate that when applied to a dataset of archaeological sites in Turkey, our approach achieves a 19.31% increase in processing speed, with an accuracy loss of merely 6%, outperforming traditional clustering methods. This method not only provides a fresh perspective for hotspot prediction but also serves as an effective tool for high-dimensional data analysis.

artificial intelligence, hotspot prediction, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2503.21084

Country:

Asia > Middle East > Republic of Türkiye (0.26)
Asia > China > Liaoning Province > Shenyang (0.04)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

Adaptive Local Clustering over Attributed Graphs

Zheng, Haoran, Yang, Renchi, Xu, Jianliang

arXiv.org Artificial IntelligenceMar-26-2025

Given a graph $G$ and a seed node $v_s$, the objective of local graph clustering (LGC) is to identify a subgraph $C_s \in G$ (a.k.a. local cluster) surrounding $v_s$ in time roughly linear with the size of $C_s$. This approach yields personalized clusters without needing to access the entire graph, which makes it highly suitable for numerous applications involving large graphs. However, most existing solutions merely rely on the topological connectivity between nodes in $G$, rendering them vulnerable to missing or noisy links that are commonly present in real-world graphs. To address this issue, this paper resorts to leveraging the complementary nature of graph topology and node attributes to enhance local clustering quality. To effectively exploit the attribute information, we first formulate the LGC as an estimation of the bidirectional diffusion distribution (BDD), which is specialized for capturing the multi-hop affinity between nodes in the presence of attributes. Furthermore, we propose LACA, an efficient and effective approach for LGC that achieves superb empirical performance on multiple real datasets while maintaining strong locality. The core components of LACA include (i) a fast and theoretically-grounded preprocessing technique for node attributes, (ii) an adaptive algorithm for diffusing any vectors over $G$ with rigorous theoretical guarantees and expedited convergence, and (iii) an effective three-step scheme for BDD approximation. Extensive experiments, comparing 17 competitors on 8 real datasets, show that LACA outperforms all competitors in terms of result quality measured against ground truth local clusters, while also being up to orders of magnitude faster. The code is available at https://github.com/HaoranZ99/alac.

data mining, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2503.20488

Country:

Asia > China > Hong Kong (0.04)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > United States > California > Alameda County > Oakland (0.04)
Europe > Netherlands > North Brabant > Eindhoven (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.47)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications > Social Media (0.95)
(2 more...)

Add feedback

A Systematic Review of EEG-based Machine Intelligence Algorithms for Depression Diagnosis, and Monitoring

Nassibi, Amir, Papavassiliou, Christos, Rakhmatulin, Ildar, Mandic, Danilo, Atashzar, S. Farokh

arXiv.org Artificial IntelligenceMar-25-2025

Depression disorder is a serious health condition that has affected the lives of millions of people around the world. Diagnosis of depression is a challenging practice that relies heavily on subjective studies and, in most cases, suffers from late findings. Electroencephalography (EEG) biomarkers have been suggested and investigated in recent years as a potential transformative objective practice. In this article, for the first time, a detailed systematic review of EEG-based depression diagnosis approaches is conducted using advanced machine learning techniques and statistical analyses. For this, 938 potentially relevant articles (since 1985) were initially detected and filtered into 139 relevant articles based on the review scheme 'preferred reporting items for systematic reviews and meta-analyses (PRISMA).' This article compares and discusses the selected articles and categorizes them according to the type of machine learning techniques and statistical analyses. Algorithms, preprocessing techniques, extracted features, and data acquisition systems are discussed and summarized. This review paper explains the existing challenges of the current algorithms and sheds light on the future direction of the field. This systematic review outlines the issues and challenges in machine intelligence for the diagnosis of EEG depression that can be addressed in future studies and possibly in future wearable technologies.

data quality, evolutionary algorithm, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2503.1982

Country:

Asia > China > Gansu Province > Lanzhou (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(29 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.93)
(5 more...)

Add feedback

RCC-PFL: Robust Client Clustering under Noisy Labels in Personalized Federated Learning

Ali, Abdulmoneam, Arafa, Ahmed

arXiv.org Artificial IntelligenceMar-25-2025

We address the problem of cluster identity estimation in a personalized federated learning (PFL) setting in which users aim to learn different personal models. The backbone of effective learning in such a setting is to cluster users into groups whose objectives are similar. A typical approach in the literature is to achieve this by training users' data on different proposed personal models and assign them to groups based on which model achieves the lowest value of the users' loss functions. This process is to be done iteratively until group identities converge. A key challenge in such a setting arises when users have noisy labeled data, which may produce misleading values of their loss functions, and hence lead to ineffective clustering. To overcome this challenge, we propose a label-agnostic data similarity-based clustering algorithm, coined RCC-PFL, with three main advantages: the cluster identity estimation procedure is independent from the training labels; it is a one-shot clustering algorithm performed prior to the training; and it requires fewer communication rounds and less computation compared to iterative-based clustering methods. We validate our proposed algorithm using various models and datasets and show that it outperforms multiple baselines in terms of average accuracy and variance reduction.

artificial intelligence, machine learning, personalized federated learning, (3 more...)

arXiv.org Artificial Intelligence

2503.19886

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.73)

Add feedback

How to optimize K-means?

Li, Qi

arXiv.org Artificial IntelligenceMar-24-2025

Center-based clustering algorithms (e.g., K-means) are popular for clustering tasks, but they usually struggle to achieve high accuracy on complex datasets. We believe the main reason is that traditional center-based clustering algorithms identify only one clustering center in each cluster. Once the distribution of the dataset is complex, a single clustering center cannot strongly represent distant objects within the cluster. How to optimize the existing center-based clustering algorithms will be valuable research. In this paper, we propose a general optimization method called ECAC, and it can optimize different center-based clustering algorithms. ECAC is independent of the clustering principle and is embedded as a component between the center process and the category assignment process of center-based clustering algorithms. Specifically, ECAC identifies several extended-centers for each clustering center. The extended-centers will act as relays to expand the representative capability of the clustering center in the complex cluster, thus improving the accuracy of center-based clustering algorithms. We conducted numerous experiments to verify the robustness and effectiveness of ECAC. ECAC is robust to diverse datasets and diverse clustering centers. After ECAC optimization, the accuracy (NMI as well as RI) of center-based clustering algorithms improves by an average of 33.4% and 64.1%, respectively, and even K-means accurately identifies complex-shaped clusters.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.19324

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.51)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

LLM-Based Insight Extraction for Contact Center Analytics and Cost-Efficient Deployment

Embar, Varsha, Shrivastava, Ritvik, Damodaran, Vinay, Mehlinger, Travis, Hsiao, Yu-Chung, Raghunathan, Karthik

arXiv.org Artificial IntelligenceMar-24-2025

Large Language Models have transformed the Contact Center industry, manifesting in enhanced self-service tools, streamlined administrative processes, and augmented agent productivity. This paper delineates our system that automates call driver generation, which serves as the foundation for tasks such as topic modeling, incoming call classification, trend detection, and FAQ generation, delivering actionable insights for contact center agents and administrators to consume. We present a cost-efficient LLM system design, with 1) a comprehensive evaluation of proprietary, open-weight, and fine-tuned models and 2) cost-efficient strategies, and 3) the corresponding cost analysis when deployed in production environments.

call driver, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.1909

Country: Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Communication-aware planning for robot teams deployment

Marchukov, Yaroslav, Montano, Luis

arXiv.org Artificial IntelligenceMar-24-2025

Abstract: In the present work we address the problem of deploying a team of robots in a scenario where some locations of interest must be reached. Thus, a planning for a deployment is required, before sending the robots. The obstacles, the limited communication range, and the need of communicating to a base station, constrain the connectivity of the team and the deployment planning. We propose a method consisting of three algorithms: a distributed path planner to obtain communication-aware trajectories; a deployment planner providing dual-use of the robots, visiting primary goals and performing connectivity tasks; and a clustering algorithm to allocate the tasks to robots, and obtain the best goal visit order for the mission. Keywords: Multi-robot systems, deployment planning, communication-aware planning 1. INTRODUCTION characterize the signal in the environment, considering the variations suffered by the signal in the propagation media. The deployment of robot teams for exploration or environmental monitoring can be executed in many ways.

artificial intelligence, machine learning, robot, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.ifacol.2017.08.1210

2503.18545

Country: Europe > Spain > Aragón > Zaragoza Province > Zaragoza (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.61)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Learning a Class of Mixed Linear Regressions: Global Convergence under General Data Conditions

Liu, Yujing, Liu, Zhixin, Guo, Lei

arXiv.org Machine LearningMar-24-2025

Mixed linear regression (MLR) has attracted increasing attention because of its great theoretical and practical importance in capturing nonlinear relationships by utilizing a mixture of linear regression sub-models. Although considerable efforts have been devoted to the learning problem of such systems, i.e., estimating data labels and identifying model parameters, most existing investigations employ the offline algorithm, impose the strict independent and identically distributed (i.i.d.) or persistent excitation (PE) conditions on the regressor data, and provide local convergence results only. In this paper, we investigate the recursive estimation and data clustering problems for a class of stochastic MLRs with two components. To address this inherently nonconvex optimization problem, we propose a novel two-step recursive identification algorithm to estimate the true parameters, where the direction vector and the scaling coefficient of the unknown parameters are estimated by the least squares and the expectation-maximization (EM) principles, respectively. Under a general data condition, which is much weaker than the traditional i.i.d. and PE conditions, we establish the global convergence and the convergence rate of the proposed identification algorithm for the first time. Furthermore, we prove that, without any excitation condition on the regressor data, the data clustering performance including the cumulative mis-classification error and the within-cluster error can be optimal asymptotically. Finally, we provide a numerical example to illustrate the performance of the proposed learning algorithm.

artificial intelligence, general data condition, machine learning, (2 more...)

arXiv.org Machine Learning

2503.185

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.73)

Add feedback