AITopics | density-based clustering

Collaborating Authors

density-based clustering

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unimodal Strategies in Density-Based Clustering

Nir, Oron, Tenenbaum, Jay, Shamir, Ariel

arXiv.org Artificial IntelligenceJun-30-2025

Density-based clustering methods often surpass centroid-based counterparts, when addressing data with noise or arbitrary data distributions common in real-world problems. In this study, we reveal a key property intrinsic to density-based clustering methods regarding the relation between the number of clusters and the neighborhood radius of core points - we empirically show that it is nearly unimodal, and support this claim theoretically in a specific setting. We leverage this property to devise new strategies for finding appropriate values for the radius more efficiently based on the Ternary Search algorithm. This is especially important for large scale data that is high-dimensional, where parameter tuning is computationally intensive. We validate our methodology through extensive applications across a range of high-dimensional, large-scale NLP, Audio, and Computer Vision tasks, demonstrating its practical effectiveness and robustness. This work not only offers a significant advancement in parameter control for density-based clustering but also broadens the understanding regarding the relations between their guiding parameters. Our code is available at https://github.com/oronnir/UnimodalStrategies.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2506.21695

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Israel (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
(11 more...)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

DECWA : Density-Based Clustering using Wasserstein Distance

Malki, Nabil El, Cugny, Robin, Teste, Olivier, Ravat, Franck

arXiv.org Artificial IntelligenceOct-25-2023

Clustering is a data analysis method for extracting knowledge by discovering groups of data called clusters. Among these methods, state-of-the-art density-based clustering methods have proven to be effective for arbitrary-shaped clusters. Despite their encouraging results, they suffer to find low-density clusters, near clusters with similar densities, and high-dimensional data. Our proposals are a new characterization of clusters and a new clustering algorithm based on spatial density and probabilistic approach. First of all, sub-clusters are built using spatial density represented as probability density function ($p.d.f$) of pairwise distances between points. A method is then proposed to agglomerate similar sub-clusters by using both their density ($p.d.f$) and their spatial distance. The key idea we propose is to use the Wasserstein metric, a powerful tool to measure the distance between $p.d.f$ of sub-clusters. We show that our approach outperforms other state-of-the-art density-based clustering methods on a wide variety of datasets.

decwa, density-based clustering, wasserstein distance

arXiv.org Artificial Intelligence

doi: 10.1145/3340531.3412125

2310.16552

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Density-Based Clustering: DBSCAN vs. HDBSCAN

#artificialintelligenceDec-5-2022, 20:45:35 GMT

Cluster Analysis is a pertinent domain in data science that enables the grouping of similar objects into distinct subgroups. While there are different families of clustering algorithms, the most widely known is K-Means. This is a centroid-based algorithm, meaning that objects in the data are clustered by being assigned to the nearest centroid. However, a major pitfall of K-Means is its lack of detecting outliers, or noisy data points, which leads them to be classified incorrectly. Furthermore, K-Means has an intrinsic preference for globular clusters and does not work very well on data comprised of arbitrarily shaped clusters.

algorithm, density-based clustering, hdbscan, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

FISHDBC: Flexible, Incremental, Scalable, Hierarchical Density-Based Clustering for Arbitrary Data and Distance

Dell'Amico, Matteo

arXiv.org Machine LearningOct-16-2019

FISHDBC is a flexible, incremental, scalable, and hierarchical density-based clustering algorithm. It is flexible because it empowers users to work on arbitrary data, skipping the feature extraction step that usually transforms raw data in numeric arrays letting users define an arbitrary distance function instead. It is incremental and scalable: it avoids the $\mathcal O(n^2)$ performance of other approaches in non-metric spaces and requires only lightweight computation to update the clustering when few items are added. It is hierarchical: it produces a "flat" clustering which can be expanded to a tree structure, so that users can group and/or divide clusters in sub- or super-clusters when data exploration requires so. It is density-based and approximates HDBSCAN*, an evolution of DBSCAN.

dataset, distance function, fishdbc, (16 more...)

arXiv.org Machine Learning

1910.07283

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(16 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

ELKI: A large open-source library for data analysis - ELKI Release 0.7.5 "Heidelberg"

Schubert, Erich, Zimek, Arthur

arXiv.org Machine LearningFeb-10-2019

This paper documents the release of the ELKI data mining framework, version 0.7.5. ELKI is an open source (AGPLv3) data mining software written in Java. The focus of ELKI is research in algorithms, with an emphasis on unsupervised methods in cluster analysis and outlier detection. In order to achieve high performance and scalability, ELKI offers data index structures such as the R*-tree that can provide major performance gains. ELKI is designed to be easy to extend for researchers and students in this domain, and welcomes contributions of additional methods. ELKI aims at providing a large collection of highly parameterizable algorithms, in order to allow easy and fair evaluation and benchmarking of algorithms. We will first outline the motivation for this release, the plans for the future, and then give a brief overview over the new functionality in this version. We also include an appendix presenting an overview on the overall implemented functionality.

density-based clustering, survey article, upstream oil & gas, (22 more...)

arXiv.org Machine Learning

1902.03616

Country:

North America > United States > Wisconsin (0.14)
North America > United States > Virginia (0.13)
North America > United States > New York (0.13)
(3 more...)

Genre: Research Report > Experimental Study (0.45)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(4 more...)

Add feedback

$\alpha$-Approximation Density-based Clustering of Multi-valued Objects

Zhang, Zhilin

arXiv.org Machine LearningAug-9-2018

Zhilin Zhang Abstract Multi-valued data are commonly found in many real applications. During the process of clustering multi-valued data, most existing methods use sampling or aggregation mechanisms that cannot reflect the real distribution of objects and their instances and thus fail to obtain high-quality clusters. In this paper, a concept ofα -approximation distance is introduced to measure the connectivity between multi-valued objects by taking account of the distribution of the instances. An α -approximation density-based clustering algorithm (DBCMO) is proposed to efficiently cluster the multi-valued objects by using global and local R* tree structures. To speed up the algorithm, four pruning rules on the tree structures are implemented. Empirical studies on synthetic and real datasets demonstrate that DBCMO can efficiently and effectively discover the multi-valued object clusters. A comparison with two existing methods further shows that DBCMO can better handle a continuous decrease in the cluster density and detect clusters of varying density. Keywords Multi-valued objects· α -Approximation· Density-based· Clustering 1 Introduction Multi-valued data (Zhang et al. 2010), including multi-instance data and uncertain data, are commonly found in many real applications. The check-in data of location-based social networks are one example. Each user is an object, and he/she can have multiple check-in records associated with different temporal and spatial information. The observation data of dynamic objects, such as seismic activity, sea floor bathymetry, and sea height, are other examples. Since the states of observed objects change constantly, the limited observation data can only reveal the objects' states with a certain probability. The clustering of multi-valued objects is the process of grouping objects into different partitions based on similarity measurements or connectivity calculations. Based on the mechanism used for measuring similarity or connectivity, the clustering algorithms for multi-valued objects can be divided into two main categories: aggregation-based clustering and sampling-based clustering. Aggregation-based clustering methodology first transfers the multi-valued objects into single-valued objects with an aggregation function (e.g. the mean). After that, various traditional clustering algorithms can be applied directly. Sampling-based methods obtain a sequence of sample points for each object using sampling techniques. And then the distance density function or the expected distance of two objects can be computed with the multiple discrete distance values from the samples. Both aggregation and sampling are useful in reducing computational cost, especially when there is large number of values for objects. However, determination of a proper aggregation function or sampling strategy is not trivial.

artificial intelligence, entry pair, machine learning, (19 more...)

arXiv.org Machine Learning

1808.033

Country:

Asia (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Burnaby (0.04)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Sports (0.47)
Information Technology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback