AITopics | agglomerative clustering

Collaborating Authors

agglomerative clustering

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Fuzzy Logic-Based Framework for Explainable Machine Learning in Big Data Analytics

Yesmin, Farjana, Shirmin, Nusrat

arXiv.org Artificial IntelligenceOct-8-2025

The growing complexity of machine learning (ML) models in big data analytics, especially in domains such as environmental monitoring, highlights the critical need for interpretability and explainability to promote trust, ethical considerations, and regulatory adherence (e.g., GDPR). Traditional "black-box" models obstruct transparency, whereas post-hoc explainable AI (XAI) techniques like LIME and SHAP frequently compromise accuracy or fail to deliver inherent insights. This paper presents a novel framework that combines type-2 fuzzy sets, granular computing, and clustering to boost explainability and fairness in big data environments. When applied to the UCI Air Quality dataset, the framework effectively manages uncertainty in noisy sensor data, produces linguistic rules, and assesses fairness using silhouette scores and entropy. Key contributions encompass: (1) A type-2 fuzzy clustering approach that enhances cohesion by about 4% compared to type-1 methods (silhouette 0.365 vs. 0.349) and improves fairness (entropy 0.918); (2) Incorporation of fairness measures to mitigate biases in unsupervised scenarios; (3) A rule-based component for intrinsic XAI, achieving an average coverage of 0.65; (4) Scalable assessments showing linear runtime (roughly 0.005 seconds for sampled big data sizes). Experimental outcomes reveal superior performance relative to baselines such as DBSCAN and Agglomerative Clustering in terms of interpretability, fairness, and efficiency. Notably, the proposed method achieves a 4% improvement in silhouette score over type-1 fuzzy clustering and outperforms baselines in fairness (entropy reduction by up to 1%) and efficiency.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.0512

Country: North America > United States (0.15)

Genre: Research Report (1.00)

Industry:

Energy (0.69)
Law (0.49)
Information Technology (0.49)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Flexible Bivariate Beta Mixture Model: A Probabilistic Approach for Clustering Complex Data Structures

Hsu, Yung-Peng, Chen, Hung-Hsuan

arXiv.org Artificial IntelligenceFeb-27-2025

This unsupervised learning method is widely used in various applications, including image analysis, information retrieval, text analysis, bioinformatics, and many more [1, 2, 3, 4]. Clustering helps uncover the underlying structure of the data, facilitates data summarization, and sometimes serves as a preprocessing step for other algorithms [2]. Despite its widespread use, one of the primary challenges many traditional clustering algorithms face is that they often assume that the data points form clusters with convex shapes. For example, centroid-based algorithms like k -means and distribution-based models like Gaussian Mixture Models (GMM) typically produce clusters that are hyperspherical or ellipsoidal [5]. Although this assumption simplifies the clustering process, it restricts the flexibility of these models to handle complex data distributions that do not conform to convex shapes.

beta distribution, bivariate beta distribution, dataset, (14 more...)

arXiv.org Artificial Intelligence

2502.19938

Country:

Asia > Taiwan (0.04)
North America > United States > California > Alameda County > Oakland (0.04)
Europe > Italy (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Unsupervised Clustering Approaches for Autism Screening: Achieving 95.31% Accuracy with a Gaussian Mixture Model

Fink, Nora

arXiv.org Artificial IntelligenceFeb-20-2025

Autism spectrum disorder (ASD) remains a challenging condition to diagnose effectively and promptly, despite global efforts in public health, clinical screening, and scientific research (1). Traditional diagnostic methods, primarily reliant on supervised learning approaches, presuppose the availability of labeled data, which can be both time-consuming and resource-intensive to obtain (2). Unsupervised learning, in contrast, offers a means of gaining insights from unlabeled datasets in a manner that can expedite or support the diagnostic process (3). This paper explores the use of four distinct unsupervised clustering algorithms--K-Means, Gaussian Mixture Model (GMM), Agglomerative Clustering, and DBSCAN--to analyze a publicly available dataset of 704 adult individuals screened for ASD. After extensive hyperparameter tuning via cross-validation, the study documents how the Gaussian Mixture Model achieved the highest clustering-to-label accuracy (95.31%) when mapped to the original ASD/NO classification (4). Other key performance metrics included the Adjusted Rand Index (ARI) and silhouette scores, which further illustrated the internal coherence of each cluster. The dataset underwent preprocessing procedures including data cleaning, label encoding of categorical features, and standard scaling, followed by a thorough cross-validation approach to assess and compare the four clustering methods (5). These results highlight the significant potential of unsupervised methods in assisting ASD screening, especially in contexts where labeled data may be sparse, uncertain, or prohibitively expensive to obtain. With continued methodological refinements, unsupervised approaches hold promise for augmenting early detection initiatives and guiding resource allocation to individuals at high risk.

accuracy, agglomerative clustering, k-means, (14 more...)

arXiv.org Artificial Intelligence

2503.05746

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology > Autism (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Agglomerative Clustering of Simulation Output Distributions Using Regularized Wasserstein Distance

Ghasemloo, Mohammadmahdi, Eckman, David J.

arXiv.org Machine LearningJul-16-2024

We investigate the use of clustering methods on data produced by a stochastic simulator, with applications in anomaly detection, pre-optimization, and online monitoring. We introduce an agglomerative clustering algorithm that clusters multivariate empirical distributions using the regularized Wasserstein distance and apply the proposed methodology on a call-center model.

agglomerative clustering, regularized wasserstein distance, simulation output distribution, (1 more...)

arXiv.org Machine Learning

2407.121

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Hierarchical Clustering: A Practical Introduction of Agglomerative and Divisive Methods

#artificialintelligenceJan-6-2023, 05:30:49 GMT

In this article, we are going to talk in detail about hierarchical clustering like Why we need hierarchical clustering?, How hierarchical clustering works?, Types of hierarchical clustering?, On which dataset it is applicable? . Before moving forward to hierarchal clustering, we should know why we are talking about hierarchical clustering? even when we have K Means clustering. If you have studied K Means then you know that this algorithm works on the distance to centroid method to create a cluster. Although it works well if you have well defined boundaries type dataset that has less outliers. In above picture, K Means is working well but when we move towards some complex datasets then the problem arises and K Means don't work properly. As you can see in below picture, K Means is failing in making clusters.

artificial intelligence, machine learning, matrix, (18 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Breaking down the agglomerative clustering process

#artificialintelligenceDec-27-2019, 13:17:56 GMT

In machine learning, unsupervised learning is a machine learning model that infers the data pattern without any guidance or label. Many models are included in the unsupervised learning family, but one of my favorite models is Agglomerative Clustering. Agglomerative Clustering or bottom-up clustering essentially started from an individual cluster (each data point is considered as an individual cluster, also called leaf), then every cluster calculates their distance with each other. The two clusters with the shortest distance with each other would merge creating what we called node. Newly formed clusters once again calculating the member of their cluster distance with another cluster outside of their cluster.

agglomerative clustering, individual cluster, learning

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

Add feedback

Graph Degree Linkage: Agglomerative Clustering on a Directed Graph

Zhang, Wei, Wang, Xiaogang, Zhao, Deli, Tang, Xiaoou

arXiv.org Machine LearningAug-24-2012

This paper proposes a simple but effective graph-based agglomerative algorithm, for clustering high-dimensional data. We explore the different roles of two fundamental concepts in graph theory, indegree and outdegree, in the context of clustering. The average indegree reflects the density near a sample, and the average outdegree characterizes the local geometry around a sample. Based on such insights, we define the affinity measure of clusters via the product of average indegree and average outdegree. The product-based affinity makes our algorithm robust to noise. The algorithm has three main advantages: good performance, easy implementation, and high computational efficiency. We test the algorithm on two fundamental computer vision problems: image clustering and object matching. Extensive experiments demonstrate that it outperforms the state-of-the-arts in both applications.

affinity, algorithm, outdegree, (14 more...)

arXiv.org Machine Learning

1208.5092

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)

Add feedback