AITopics

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Neural Information Processing SystemsOct-9-2025, 22:35:43 GMT

Scalable DBSCAN with Random Projections

Theoretically, sDBSCAN preserves the DBSCAN's clustering structure under mild

core point, minpt, sdbscan, (15 more...)

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Neural Information Processing SystemsAug-17-2025, 10:52:46 GMT

fdf1bc5669e8ff5ba45d02fded729feb-Supplemental.pdf

algorithm, dataset, sng-dbscan, (15 more...)

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > New York (0.04)
North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
(2 more...)

Industry: Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Neural Information Processing SystemsAug-17-2025, 10:52:27 GMT

fdf1bc5669e8ff5ba45d02fded729feb-AuthorFeedback.pdf

The reviewer brings up a good point. SNG-DBSCAN recovers these clusters at rates depending on various properties of the density function. We will further clarify these constant factor dependencies.

artificial intelligence, main concern, sng-dbscan, (10 more...)

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)

Technology: Information Technology > Artificial Intelligence (0.31)

Nir, Oron, Tenenbaum, Jay, Shamir, Ariel

Unimodal Strategies in Density-Based Clustering

arXiv.org Artificial IntelligenceJun-30-2025

Density-based clustering methods often surpass centroid-based counterparts, when addressing data with noise or arbitrary data distributions common in real-world problems. In this study, we reveal a key property intrinsic to density-based clustering methods regarding the relation between the number of clusters and the neighborhood radius of core points - we empirically show that it is nearly unimodal, and support this claim theoretically in a specific setting. We leverage this property to devise new strategies for finding appropriate values for the radius more efficiently based on the Ternary Search algorithm. This is especially important for large scale data that is high-dimensional, where parameter tuning is computationally intensive. We validate our methodology through extensive applications across a range of high-dimensional, large-scale NLP, Audio, and Computer Vision tasks, demonstrating its practical effectiveness and robustness. This work not only offers a significant advancement in parameter control for density-based clustering but also broadens the understanding regarding the relations between their guiding parameters. Our code is available at https://github.com/oronnir/UnimodalStrategies.

artificial intelligence, dataset, machine learning, (17 more...)

2506.21695

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Israel (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
(11 more...)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceMay-9-2025

Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning

Peng, Hao, Huang, Xiang, Sun, Shuo, Zhang, Ruitong, Yu, Philip S.

DBSCAN, a well-known density-based clustering algorithm, has gained widespread popularity and usage due to its effectiveness in identifying clusters of arbitrary shapes and handling noisy data. However, it encounters challenges in producing satisfactory cluster results when confronted with datasets of varying density scales, a common scenario in real-world applications. In this paper, we propose a novel Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning cluster framework, namely AR-DBSCAN. First, we model the initial dataset as a two-level encoding tree and categorize the data vertices into distinct density partitions according to the information uncertainty determined in the encoding tree. Each partition is then assigned to an agent to find the best clustering parameters without manual assistance. The allocation is density-adaptive, enabling AR-DBSCAN to effectively handle diverse density distributions within the dataset by utilizing distinct agents for different partitions. Second, a multi-agent deep reinforcement learning guided automatic parameter searching process is designed. The process of adjusting the parameter search direction by perceiving the clustering environment is modeled as a Markov decision process. Using a weakly-supervised reward training policy network, each agent adaptively learns the optimal clustering parameters by interacting with the clusters. Third, a recursive search mechanism adaptable to the data's scale is presented, enabling efficient and controlled exploration of large parameter spaces. Extensive experiments are conducted on nine artificial datasets and a real-world dataset. The results of offline and online tasks show that AR-DBSCAN not only improves clustering accuracy by up to 144.1% and 175.3% in the NMI and ARI metrics, respectively, but also is capable of robustly finding dominant parameters.

evolutionary algorithm, machine learning, reinforcement learning, (17 more...)

2505.04339

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Italy (0.04)
Asia > China (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Phu, Nguyen Thi Minh, Loc, Duong Tan, Duy, Vo Nguyen Le

Statistical Inference for Clustering-based Anomaly Detection

arXiv.org Machine LearningApr-25-2025

Unsupervised anomaly detection (AD) is a fundamental problem in machine learning and statistics. A popular approach to unsupervised AD is clustering-based detection. However, this method lacks the ability to guarantee the reliability of the detected anomalies. In this paper, we propose SI-CLAD (Statistical Inference for CLustering-based Anomaly Detection), a novel statistical framework for testing the clustering-based AD results. The key strength of SI-CLAD lies in its ability to rigorously control the probability of falsely identifying anomalies, maintaining it below a pre-specified significance level $\alpha$ (e.g., $\alpha = 0.05$). By analyzing the selection mechanism inherent in clustering-based AD and leveraging the Selective Inference (SI) framework, we prove that false detection control is attainable. Moreover, we introduce a strategy to boost the true detection rate, enhancing the overall performance of SI-CLAD. Extensive experiments on synthetic and real-world datasets provide strong empirical support for our theoretical findings, showcasing the superior performance of the proposed method.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

2504.18633

Country:

North America > United States > Wisconsin (0.04)
Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.04)
Asia > Middle East > UAE > Dubai Emirate > Dubai (0.04)
Asia > Japan (0.04)

Genre: Research Report > Experimental Study (0.34)

Industry: Health & Medicine > Therapeutic Area (0.95)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

arXiv.org Artificial IntelligenceOct-17-2024

GBCT: An Efficient and Adaptive Granular-Ball Clustering Algorithm for Complex Data

Xia, Shuyin, Shi, Bolun, Wang, Yifan, Xie, Jiang, Wang, Guoyin, Gao, Xinbo

Traditional clustering algorithms often focus on the most fine-grained information and achieve clustering by calculating the distance between each pair of data points or implementing other calculations based on points. This way is not inconsistent with the cognitive mechanism of "global precedence" in human brain, resulting in those methods' bad performance in efficiency, generalization ability and robustness. To address this problem, we propose a new clustering algorithm called granular-ball clustering (GBCT) via granular-ball computing. Firstly, GBCT generates a smaller number of granular-balls to represent the original data, and forms clusters according to the relationship between granular-balls, instead of the traditional point relationship. At the same time, its coarse-grained characteristics are not susceptible to noise, and the algorithm is efficient and robust; besides, as granular-balls can fit various complex data, GBCT performs much better in non-spherical data sets than other traditional clustering methods. The completely new coarse granularity representation method of GBCT and cluster formation mode can also used to improve other traditional methods.

artificial intelligence, data mining, machine learning, (20 more...)

2410.13917

Country:

Asia > China > Chongqing Province > Chongqing (0.06)
Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > Japan > Honshū > Chūbu > Shizuoka Prefecture > Shizuoka (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Dennehy, Andrew, Zou, Xiaoyu, Semnani, Shabnam J., Fialko, Yuri, Cloninger, Alexander

LINSCAN -- A Linearity Based Clustering Algorithm

arXiv.org Artificial IntelligenceJun-25-2024

DBSCAN and OPTICS are powerful algorithms for identifying clusters of points in domains where few assumptions can be made about the structure of the data. In this paper, we leverage these strengths and introduce a new algorithm, LINSCAN, designed to seek lineated clusters that are difficult to find and isolate with existing methods. In particular, by embedding points as normal distributions approximating their local neighborhoods and leveraging a distance function derived from the Kullback Leibler Divergence, LINSCAN can detect and distinguish lineated clusters that are spatially close but have orthogonal covariances. We demonstrate how LINSCAN can be applied to seismic data to identify active faults, including intersecting faults, and determine their orientation. Finally, we discuss the properties a generalization of DBSCAN and OPTICS must have in order to retain the stability benefits of these algorithms.

algorithm, artificial intelligence, machine learning, (18 more...)

2406.17952

Country: North America > United States > California (0.47)

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.83)