dbscan
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
Mass Distribution versus Density Distribution in the Context of Clustering
Ting, Kai Ming, Zhu, Ye, Zhang, Hang, Liang, Tianrun
This paper investigates two fundamental descriptors of data, i.e., density distribution versus mass distribution, in the context of clustering. Density distribution has been the de facto descriptor of data distribution since the introduction of statistics. We show that density distribution has its fundamental limitation -- high-density bias, irrespective of the algorithms used to perform clustering. Existing density-based clustering algorithms have employed different algorithmic means to counter the effect of the high-density bias with some success, but the fundamental limitation of using density distribution remains an obstacle to discovering clusters of arbitrary shapes, sizes and densities. Using the mass distribution as a better foundation, we propose a new algorithm which maximizes the total mass of all clusters, called mass-maximization clustering (MMC). The algorithm can be easily changed to maximize the total density of all clusters in order to examine the fundamental limitation of using density distribution versus mass distribution. The key advantage of the MMC over the density-maximization clustering is that the maximization is conducted without a bias towards dense clusters.
- Asia > China > Jiangsu Province > Nanjing (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Oceania > Australia (0.04)
- North America > United States > Michigan (0.04)
- Research Report (1.00)
- Overview (1.00)
SACA: Selective Attention-Based Clustering Algorithm
Bilehsavar, Meysam Shirdel, Ghaedi, Razieh, Taheri, Samira Seyed, Fan, Xinqi, O'Reilly, Christian
Clustering algorithms are fundamental tools across many fields, with density-based methods offering particular advantages in identifying arbitrarily shaped clusters and handling noise. However, their effectiveness is often limited by the requirement of critical parameter tuning by users, which typically requires significant domain expertise. This paper introduces a novel density-based clustering algorithm loosely inspired by the concept of selective attention, designed to minimize reliance on parameter tuning for most applications. The proposed method computes an adaptive threshold to exclude sparsely distributed points and outliers, constructs an initial cluster framework, and subsequently reintegrates the filtered points to refine the final results. Extensive experiments on diverse benchmark datasets demonstrate the robustness, accuracy, and ease of use of the proposed approach, establishing it as a powerful alternative to conventional density-based clustering techniques.
- North America > United States > South Carolina (0.04)
- North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)
- Europe > Germany (0.04)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Emergent Lexical Semantics in Neural Language Models: Testing Martin's Law on LLM-Generated Text
We present the first systematic investigation of Martin's Law - the empirical relationship between word frequency and polysemy - in text generated by neural language models during training. Using DBSCAN clustering of contextualized embeddings as an operationalization of word senses, we analyze four Pythia models (70M-1B parameters) across 30 training checkpoints. Our results reveal a non-monotonic developmental trajectory: Martin's Law emerges around checkpoint 100, reaches peak correlation (r > 0.6) at checkpoint 104, then degrades by checkpoint 105. Smaller models (70M, 160M) experience catastrophic semantic collapse at late checkpoints, while larger models (410M, 1B) show graceful degradation. The frequency-specificity trade-off remains stable (r $\approx$ -0.3) across all models. These findings suggest that compliance with linguistic regularities in LLM-generated text is not monotonically increasing with training, but instead follows a balanced trajectory with an optimal semantic window. This work establishes a novel methodology for evaluating emergent linguistic structure in neural language models.
GPS Spoofing Attack Detection in Autonomous Vehicles Using Adaptive DBSCAN
Mohammadi, Ahmad, Ahmari, Reza, Hemmati, Vahid, Owusu-Ambrose, Frederick, Mahmoud, Mahmoud Nabil, Kebria, Parham, Homaifar, Abdollah, Saif, Mehrdad
Abstract-- As autonomous vehicles become an essential component of modern transportation, they are increasingly vulnerable to threats such as GPS spoofing attacks. This study presents an adaptive detection approach utilizing a dynamically tuned Density Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, designed to adjust the detection threshold (ε) in real-time. The threshold is updated based on the recursive mean and standard deviation of displacement errors between GPS and in-vehicle sensors data, but only at instances classified as non-anomalous. Furthermore, an initial threshold, determined from 120,000 clean data samples, ensures the capability to identify even subtle and gradual GPS spoofing attempts from the beginning. T o assess the performance of the proposed method, five different subsets from the real-world Honda Research Institute Driving Dataset (HDD) are selected to simulate both large and small magnitude GPS spoofing attacks. The modified algorithm effectively identifies turn-by-turn, stop, overshoot, and multiple small biased spoofing attacks, achieving detection accuracies of 98.62 1%, 99.96 0.1%, 99.88 0.1%, and 98.38 0.1%, respectively. This work provides a substantial advancement in enhancing the security and safety of A Vs against GPS spoofing threats.
- North America > United States > Alabama > Tuscaloosa County > Tuscaloosa (0.14)
- Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
- North America > United States > North Carolina > Guilford County > Greensboro (0.04)
- (2 more...)
- Information Technology > Security & Privacy (1.00)
- Transportation > Ground > Road (0.68)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
A Fuzzy Logic-Based Framework for Explainable Machine Learning in Big Data Analytics
Yesmin, Farjana, Shirmin, Nusrat
The growing complexity of machine learning (ML) models in big data analytics, especially in domains such as environmental monitoring, highlights the critical need for interpretability and explainability to promote trust, ethical considerations, and regulatory adherence (e.g., GDPR). Traditional "black-box" models obstruct transparency, whereas post-hoc explainable AI (XAI) techniques like LIME and SHAP frequently compromise accuracy or fail to deliver inherent insights. This paper presents a novel framework that combines type-2 fuzzy sets, granular computing, and clustering to boost explainability and fairness in big data environments. When applied to the UCI Air Quality dataset, the framework effectively manages uncertainty in noisy sensor data, produces linguistic rules, and assesses fairness using silhouette scores and entropy. Key contributions encompass: (1) A type-2 fuzzy clustering approach that enhances cohesion by about 4% compared to type-1 methods (silhouette 0.365 vs. 0.349) and improves fairness (entropy 0.918); (2) Incorporation of fairness measures to mitigate biases in unsupervised scenarios; (3) A rule-based component for intrinsic XAI, achieving an average coverage of 0.65; (4) Scalable assessments showing linear runtime (roughly 0.005 seconds for sampled big data sizes). Experimental outcomes reveal superior performance relative to baselines such as DBSCAN and Agglomerative Clustering in terms of interpretability, fairness, and efficiency. Notably, the proposed method achieves a 4% improvement in silhouette score over type-1 fuzzy clustering and outperforms baselines in fairness (entropy reduction by up to 1%) and efficiency.
- Europe > Switzerland (0.05)
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- Energy (0.69)
- Law (0.49)
- Information Technology (0.49)
Constrained Centroid Clustering: A Novel Approach for Compact and Structured Partitioning
Veeramachaneni, Sowmini Devi, Garimella, Ramamurthy
This paper presents Constrained Centroid Clustering (CCC), a method that extends classical centroid-based clustering by enforcing a constraint on the maximum distance between the cluster center and the farthest point in the cluster. Using a Lagrangian formulation, we derive a closed-form solution that maintains interpretability while controlling cluster spread. To evaluate CCC, we conduct experiments on synthetic circular data with radial symmetry and uniform angular distribution. Using ring-wise, sector-wise, and joint entropy as evaluation metrics, we show that CCC achieves more compact clusters by reducing radial spread while preserving angular structure, outperforming standard methods such as K-means and GMM. The proposed approach is suitable for applications requiring structured clustering with spread control, including sensor networks, collaborative robotics, and interpretable pattern analysis.
- Asia > India > Telangana > Hyderabad (0.05)
- North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
- North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
- Asia > India > Andhra Pradesh > Kakinada (0.04)
- Research Report > Promising Solution (0.50)
- Overview > Innovation (0.40)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Communications > Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Enhanced Sparse Point Cloud Data Processing for Privacy-aware Human Action Recognition
Tunau, Maimunatu, Zakka, Vincent Gbouna, Dai, Zhuangzhuang
Human Action Recognition (HAR) plays a crucial role in healthcare, fitness tracking, and ambient assisted living technologies. While traditional vision based HAR systems are effective, they pose privacy concerns. mmWave radar sensors offer a privacy preserving alternative but present challenges due to the sparse and noisy nature of their point cloud data. In the literature, three primary data processing methods: Density-Based Spatial Clustering of Applications with Noise (DBSCAN), the Hungarian Algorithm, and Kalman Filtering have been widely used to improve the quality and continuity of radar data. However, a comprehensive evaluation of these methods, both individually and in combination, remains lacking. This paper addresses that gap by conducting a detailed performance analysis of the three methods using the MiliPoint dataset. We evaluate each method individually, all possible pairwise combinations, and the combination of all three, assessing both recognition accuracy and computational cost. Furthermore, we propose targeted enhancements to the individual methods aimed at improving accuracy. Our results provide crucial insights into the strengths and trade-offs of each method and their integrations, guiding future work on mmWave based HAR systems
- North America > United States > North Carolina (0.04)
- Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
- Asia > China (0.04)
- Information Technology > Software (0.63)
- Health & Medicine > Consumer Health (0.54)
- Information Technology > Security & Privacy (0.48)
- Information Technology > Smart Houses & Appliances (0.46)
Scalable Varied-Density Clustering via Graph Propagation
Pham, Ninh, Zheng, Yingtao, Phibbs, Hugo
We propose a novel perspective on varied-density clustering for high-dimensional data by framing it as a label propagation process in neighborhood graphs that adapt to local density variations. Our method formally connects density-based clustering with graph connectivity, enabling the use of efficient graph propagation techniques developed in network science. To ensure scalability, we introduce a density-aware neighborhood propagation algorithm and leverage advanced random projection methods to construct approximate neighborhood graphs. Our approach significantly reduces computational cost while preserving clustering quality. Empirically, it scales to datasets with millions of points in minutes and achieves competitive accuracy compared to existing baselines.
- Europe > Netherlands > South Holland > Leiden (0.09)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.05)
- Asia > China (0.04)