AITopics | dbscan

Collaborating Authors

dbscan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Central Description Length (CDL) Clustering Validation Index

Shamsi, Mahdi, Beheshti, Soosan

arXiv.org Machine LearningJun-5-2026

Selecting a clustering algorithm and its hyperparameters without labels is a common difficulty in engineering machine learning pipelines that work with unsupervised analysis of sensor, image, or process data. Clustering validation indices (CVIs) provide internal scores for ranking candidate clusterings, but most popular CVIs are built from Euclidean compactness and separation terms and so tend to favour compact, convex partitions. Their performance is known to degrade on non convex, irregular, or variable density data, where kernel transformations or alternative distance measures are typically used at the cost of additional tuning and computation. This paper introduces the Central Description Length (CDL) clustering validation index. CDL uses the observed within cluster compactness, the estimated cluster centers, and the estimated cluster covariances to compute a probabilistic upper bound on the description length associated with the unobservable true cluster centers. The bound condenses intra cluster compactness and centroid displacement into a single computable quantity and is evaluated on the partition produced by any clustering algorithm. The implementation uses only observable quantities (the data, the partition, the estimated centers, and the estimated covariances) and does not use ground truth labels. On synthetic benchmarks with non convex and arbitrary shape clusters, CDL-CVI selected the reference number of clusters more often and reached higher Adjusted Rand Index (ARI) values than the conventional CVIs we tested, without an additional kernel preprocessing stage. On image benchmarks (MNIST, CIFAR-10, STL-10) clustered from frozen unsupervised embeddings, CDL-CVI returned cluster numbers close to the reference class counts across K-means, DBSCAN, and spectral clustering in the reported trials. We also discuss limitations of the approach, in particular its dependence on covariance estimation, the chosen distance metric, and the input representation. 1 Introduction Many engineering machine learning pipelines rely on the clustering of unlabeled measurements: fault diagnosis from vibration and acoustic signals, sensor state discovery in industrial processes, condition monitoring of mechanical and electrical systems, materials characterization, segmentation of images and signals, and exploratory grouping of process variables.

artificial intelligence, hyperparameter, machine learning, (17 more...)

arXiv.org Machine Learning

2606.0523

Country: North America > Canada > Ontario (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

31421b112e5f7faf4fc577b74e45dab2-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 18:45:00 GMT

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Add feedback

Mass Distribution versus Density Distribution in the Context of Clustering

Ting, Kai Ming, Zhu, Ye, Zhang, Hang, Liang, Tianrun

arXiv.org Machine LearningJan-26-2026

This paper investigates two fundamental descriptors of data, i.e., density distribution versus mass distribution, in the context of clustering. Density distribution has been the de facto descriptor of data distribution since the introduction of statistics. We show that density distribution has its fundamental limitation -- high-density bias, irrespective of the algorithms used to perform clustering. Existing density-based clustering algorithms have employed different algorithmic means to counter the effect of the high-density bias with some success, but the fundamental limitation of using density distribution remains an obstacle to discovering clusters of arbitrary shapes, sizes and densities. Using the mass distribution as a better foundation, we propose a new algorithm which maximizes the total mass of all clusters, called mass-maximization clustering (MMC). The algorithm can be easily changed to maximize the total density of all clusters in order to examine the fundamental limitation of using density distribution versus mass distribution. The key advantage of the MMC over the density-maximization clustering is that the maximization is conducted without a bias towards dense clusters.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2601.10759

Country:

North America > United States (0.46)
Asia (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

SACA: Selective Attention-Based Clustering Algorithm

Bilehsavar, Meysam Shirdel, Ghaedi, Razieh, Taheri, Samira Seyed, Fan, Xinqi, O'Reilly, Christian

arXiv.org Artificial IntelligenceDec-1-2025

Clustering algorithms are fundamental tools across many fields, with density-based methods offering particular advantages in identifying arbitrarily shaped clusters and handling noise. However, their effectiveness is often limited by the requirement of critical parameter tuning by users, which typically requires significant domain expertise. This paper introduces a novel density-based clustering algorithm loosely inspired by the concept of selective attention, designed to minimize reliance on parameter tuning for most applications. The proposed method computes an adaptive threshold to exclude sparsely distributed points and outliers, constructs an initial cluster framework, and subsequently reintegrates the filtered points to refine the final results. Extensive experiments on diverse benchmark datasets demonstrate the robustness, accuracy, and ease of use of the proposed approach, establishing it as a powerful alternative to conventional density-based clustering techniques.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2508.1715

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Emergent Lexical Semantics in Neural Language Models: Testing Martin's Law on LLM-Generated Text

Kugler, Kai

arXiv.org Artificial IntelligenceNov-27-2025

We present the first systematic investigation of Martin's Law - the empirical relationship between word frequency and polysemy - in text generated by neural language models during training. Using DBSCAN clustering of contextualized embeddings as an operationalization of word senses, we analyze four Pythia models (70M-1B parameters) across 30 training checkpoints. Our results reveal a non-monotonic developmental trajectory: Martin's Law emerges around checkpoint 100, reaches peak correlation (r > 0.6) at checkpoint 104, then degrades by checkpoint 105. Smaller models (70M, 160M) experience catastrophic semantic collapse at late checkpoints, while larger models (410M, 1B) show graceful degradation. The frequency-specificity trade-off remains stable (r $\approx$ -0.3) across all models. These findings suggest that compliance with linguistic regularities in LLM-generated text is not monotonically increasing with training, but instead follows a balanced trajectory with an optimal semantic window. This work establishes a novel methodology for evaluating emergent linguistic structure in neural language models.

large language model, machine learning, martin, (17 more...)

arXiv.org Artificial Intelligence

2511.21334

Genre: Research Report > New Finding (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.96)

Add feedback

GPS Spoofing Attack Detection in Autonomous Vehicles Using Adaptive DBSCAN

Mohammadi, Ahmad, Ahmari, Reza, Hemmati, Vahid, Owusu-Ambrose, Frederick, Mahmoud, Mahmoud Nabil, Kebria, Parham, Homaifar, Abdollah, Saif, Mehrdad

arXiv.org Artificial IntelligenceOct-14-2025

Abstract-- As autonomous vehicles become an essential component of modern transportation, they are increasingly vulnerable to threats such as GPS spoofing attacks. This study presents an adaptive detection approach utilizing a dynamically tuned Density Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, designed to adjust the detection threshold (ε) in real-time. The threshold is updated based on the recursive mean and standard deviation of displacement errors between GPS and in-vehicle sensors data, but only at instances classified as non-anomalous. Furthermore, an initial threshold, determined from 120,000 clean data samples, ensures the capability to identify even subtle and gradual GPS spoofing attempts from the beginning. T o assess the performance of the proposed method, five different subsets from the real-world Honda Research Institute Driving Dataset (HDD) are selected to simulate both large and small magnitude GPS spoofing attacks. The modified algorithm effectively identifies turn-by-turn, stop, overshoot, and multiple small biased spoofing attacks, achieving detection accuracies of 98.62 1%, 99.96 0.1%, 99.88 0.1%, and 98.38 0.1%, respectively. This work provides a substantial advancement in enhancing the security and safety of A Vs against GPS spoofing threats.

artificial intelligence, machine learning, vehicle, (18 more...)

arXiv.org Artificial Intelligence

2510.10766

Country: North America > United States > Alabama (0.28)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation > Ground > Road (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Scalable DBSCAN with Random Projections

Neural Information Processing SystemsOct-9-2025, 22:35:43 GMT

Theoretically, sDBSCAN preserves the DBSCAN's clustering structure under mild

core point, minpt, sdbscan, (15 more...)

Neural Information Processing Systems

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Add feedback

A Fuzzy Logic-Based Framework for Explainable Machine Learning in Big Data Analytics

Yesmin, Farjana, Shirmin, Nusrat

arXiv.org Artificial IntelligenceOct-8-2025

The growing complexity of machine learning (ML) models in big data analytics, especially in domains such as environmental monitoring, highlights the critical need for interpretability and explainability to promote trust, ethical considerations, and regulatory adherence (e.g., GDPR). Traditional "black-box" models obstruct transparency, whereas post-hoc explainable AI (XAI) techniques like LIME and SHAP frequently compromise accuracy or fail to deliver inherent insights. This paper presents a novel framework that combines type-2 fuzzy sets, granular computing, and clustering to boost explainability and fairness in big data environments. When applied to the UCI Air Quality dataset, the framework effectively manages uncertainty in noisy sensor data, produces linguistic rules, and assesses fairness using silhouette scores and entropy. Key contributions encompass: (1) A type-2 fuzzy clustering approach that enhances cohesion by about 4% compared to type-1 methods (silhouette 0.365 vs. 0.349) and improves fairness (entropy 0.918); (2) Incorporation of fairness measures to mitigate biases in unsupervised scenarios; (3) A rule-based component for intrinsic XAI, achieving an average coverage of 0.65; (4) Scalable assessments showing linear runtime (roughly 0.005 seconds for sampled big data sizes). Experimental outcomes reveal superior performance relative to baselines such as DBSCAN and Agglomerative Clustering in terms of interpretability, fairness, and efficiency. Notably, the proposed method achieves a 4% improvement in silhouette score over type-1 fuzzy clustering and outperforms baselines in fairness (entropy reduction by up to 1%) and efficiency.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.0512

Country: North America > United States (0.15)

Genre: Research Report (1.00)

Industry:

Energy (0.69)
Law (0.49)
Information Technology (0.49)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Constrained Centroid Clustering: A Novel Approach for Compact and Structured Partitioning

Veeramachaneni, Sowmini Devi, Garimella, Ramamurthy

arXiv.org Machine LearningAug-19-2025

This paper presents Constrained Centroid Clustering (CCC), a method that extends classical centroid-based clustering by enforcing a constraint on the maximum distance between the cluster center and the farthest point in the cluster. Using a Lagrangian formulation, we derive a closed-form solution that maintains interpretability while controlling cluster spread. To evaluate CCC, we conduct experiments on synthetic circular data with radial symmetry and uniform angular distribution. Using ring-wise, sector-wise, and joint entropy as evaluation metrics, we show that CCC achieves more compact clusters by reducing radial spread while preserving angular structure, outperforming standard methods such as K-means and GMM. The proposed approach is suitable for applications requiring structured clustering with spread control, including sensor networks, collaborative robotics, and interpretable pattern analysis.

cluster center, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

2508.12758

Country:

Asia > India > Telangana > Hyderabad (0.05)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
Asia > India > Andhra Pradesh > Kakinada (0.04)

Genre:

Research Report > Promising Solution (0.50)
Overview > Innovation (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Enhanced Sparse Point Cloud Data Processing for Privacy-aware Human Action Recognition

Tunau, Maimunatu, Zakka, Vincent Gbouna, Dai, Zhuangzhuang

arXiv.org Artificial IntelligenceAug-15-2025

Human Action Recognition (HAR) plays a crucial role in healthcare, fitness tracking, and ambient assisted living technologies. While traditional vision based HAR systems are effective, they pose privacy concerns. mmWave radar sensors offer a privacy preserving alternative but present challenges due to the sparse and noisy nature of their point cloud data. In the literature, three primary data processing methods: Density-Based Spatial Clustering of Applications with Noise (DBSCAN), the Hungarian Algorithm, and Kalman Filtering have been widely used to improve the quality and continuity of radar data. However, a comprehensive evaluation of these methods, both individually and in combination, remains lacking. This paper addresses that gap by conducting a detailed performance analysis of the three methods using the MiliPoint dataset. We evaluate each method individually, all possible pairwise combinations, and the combination of all three, assessing both recognition accuracy and computational cost. Furthermore, we propose targeted enhancements to the individual methods aimed at improving accuracy. Our results provide crucial insights into the strengths and trade-offs of each method and their integrations, guiding future work on mmWave based HAR systems

artificial intelligence, dbscan, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2508.10469

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Software (0.63)
Health & Medicine > Consumer Health (0.54)
Information Technology > Security & Privacy (0.48)
Information Technology > Smart Houses & Appliances (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback