AITopics

doi: 10.1109/ICDM58522.2023.00075

2401.07551

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(12 more...)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Vardakas, Georgios, Pavlopoulos, John, Likas, Aristidis

Silhouette Aggregation: From Micro to Macro

arXiv.org Artificial IntelligenceJan-15-2024

Silhouette coefficient is an established internal clustering evaluation measure that produces a score per data point, assessing the quality of its clustering assignment. To assess the quality of the clustering of the whole dataset, the scores of all the points in the dataset are either (micro) averaged into a single value or averaged at the cluster level and then (macro) averaged. As we illustrate in this work, by using a synthetic example, the micro-averaging strategy is sensitive both to cluster imbalance and outliers (background noise) while macro-averaging is far more robust to both. Furthermore, the latter allows cluster-balanced sampling which yields robust computation of the silhouette score. By conducting an experimental study on eight real-world datasets, estimating the ground truth number of clusters, we show that both coefficients, micro and macro, should be considered.

background noise, dataset, silhouette score, (11 more...)

2401.05831

Country:

Europe > Greece > Epirus > Ioannina (0.05)
Europe > Greece > Attica > Athens (0.04)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.66)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceJan-14-2024

A Rapid Review of Clustering Algorithms

Yin, Hui, Aryani, Amir, Petrie, Stephen, Nambissan, Aishwarya, Astudillo, Aland, Cao, Shengyuan

Clustering algorithms aim to organize data into groups or clusters based on the inherent patterns and similarities within the data. They play an important role in today's life, such as in marketing and e-commerce, healthcare, data organization and analysis, and social media. Numerous clustering algorithms exist, with ongoing developments introducing new ones. Each algorithm possesses its own set of strengths and weaknesses, and as of now, there is no universally applicable algorithm for all tasks. In this work, we analyzed existing clustering algorithms and classify mainstream algorithms across five different dimensions: underlying principles and characteristics, data point assignment to clusters, dataset capacity, predefined cluster numbers and application area. This classification facilitates researchers in understanding clustering algorithms from various perspectives and helps them identify algorithms suitable for solving specific tasks. Finally, we discussed the current trends and potential future directions in clustering algorithms. We also identified and discussed open challenges and unresolved issues in the field.

algorithm, clustering, dataset, (15 more...)

2401.07389

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
(5 more...)

Genre: Overview (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Woubie, Abraham, Solomon, Enoch, Emiru, Eyael Solomon

Image Clustering using Restricted Boltzman Machine

arXiv.org Artificial IntelligenceJan-14-2024

In various verification systems, Restricted Boltzmann Machines (RBMs) have demonstrated their efficacy in both front-end and back-end processes. In this work, we propose the use of RBMs to the image clustering tasks. RBMs are trained to convert images into image embeddings. We employ the conventional bottom-up Agglomerative Hierarchical Clustering (AHC) technique. To address the challenge of limited test face image data, we introduce Agglomerative Hierarchical Clustering based Method for Image Clustering using Restricted Boltzmann Machine (AHC-RBM) with two major steps. Initially, a universal RBM model is trained using all available training dataset. Subsequently, we train an adapted RBM model using the data from each test image. Finally, RBM vectors which is the embedding vector is generated by concatenating the visible-to-hidden weight matrices of these adapted models, and the bias vectors. These vectors effectively preserve class-specific information and are utilized in image clustering tasks. Our experimental results, conducted on two benchmark image datasets (MS-Celeb-1M and DeepFashion), demonstrate that our proposed approach surpasses well-known clustering algorithms such as k-means, spectral clustering, and approximate Rank-order.

computer vision, proceedings, vector, (12 more...)

2312.13845

Country:

North America > United States > Virginia > Richmond (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.87)

Zhang, Anderson Y., Zhou, Harrison H.

Leave-one-out Singular Subspace Perturbation Analysis for Spectral Clustering

arXiv.org Artificial IntelligenceJan-14-2024

The singular subspaces perturbation theory is of fundamental importance in probability and statistics. It has various applications across different fields. We consider two arbitrary matrices where one is a leave-one-column-out submatrix of the other one and establish a novel perturbation upper bound for the distance between the two corresponding singular subspaces. It is well-suited for mixture models and results in a sharper and finer statistical analysis than classical perturbation bounds such as Wedin's Theorem. Empowered by this leave-one-out perturbation theory, we provide a deterministic entrywise analysis for the performance of spectral clustering under mixture models. Our analysis leads to an explicit exponential error rate for spectral clustering of sub-Gaussian mixture models. For the mixture of isotropic Gaussians, the rate is optimal under a weaker signal-to-noise condition than that of L{\"o}ffler et al. (2021).

mixture model, spectral, theorem 2, (12 more...)

2205.14855

Country:

North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Tōhoku > Miyagi Prefecture > Sendai (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Laber, Eduardo S., Murtinho, Lucas

Optimization of Inter-group Criteria for Clustering with Minimum Size Constraints

arXiv.org Artificial IntelligenceJan-13-2024

Internal measures that are used to assess the quality of a clustering usually take into account intra-group and/or inter-group criteria. There are many papers in the literature that propose algorithms with provable approximation guarantees for optimizing the former. However, the optimization of inter-group criteria is much less understood. Here, we contribute to the state-of-the-art of this literature by devising algorithms with provable guarantees for the maximization of two natural inter-group criteria, namely the minimum spacing and the minimum spanning tree spacing. The former is the minimum distance between points in different groups while the latter captures separability through the cost of the minimum spanning tree that connects all groups. We obtain results for both the unrestricted case, in which no constraint on the clusters is imposed, and for the constrained case where each group is required to have a minimum number of points. Our constraint is motivated by the fact that the popular Single Linkage, which optimizes both criteria in the unrestricted case, produces clusterings with many tiny groups. To complement our work, we present an empirical study with 10 real datasets, providing evidence that our methods work very well in practical settings.

algorithm, algorithm 2, mst-sp, (15 more...)

2401.07091

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

CCFC: Bridging Federated Clustering and Contrastive Learning

Yan, Jie, Liu, Jing, Zhang, Zhong-Yuan

Federated clustering, an essential extension of centralized clustering for federated scenarios, enables multiple data-holding clients to collaboratively group data while keeping their data locally. In centralized scenarios, clustering driven by representation learning has made significant advancements in handling high-dimensional complex data. However, the combination of federated clustering and representation learning remains underexplored. To bridge this, we first tailor a cluster-contrastive model for learning clustering-friendly representations. Then, we harness this model as the foundation for proposing a new federated clustering method, named cluster-contrastive federated clustering (CCFC). Benefiting from representation learning, the clustering performance of CCFC even double those of the best baseline methods in some cases. Compared to the most related baseline, the benefit results in substantial NMI score improvements of up to 0.4155 on the most conspicuous case. Moreover, CCFC also shows superior performance in handling device failures from a practical viewpoint.

learning, representation, scenario, (16 more...)

2401.06634

Country:

North America > United States > Virginia (0.04)
Asia > China (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Every Node is Different: Dynamically Fusing Self-Supervised Tasks for Attributed Graph Clustering

Zhu, Pengfei, Wang, Qian, Wang, Yu, Li, Jialu, Hu, Qinghua

Attributed graph clustering is an unsupervised task that partitions nodes into different groups. Self-supervised learning (SSL) shows great potential in handling this task, and some recent studies simultaneously learn multiple SSL tasks to further boost performance. Currently, different SSL tasks are assigned the same set of weights for all graph nodes. However, we observe that some graph nodes whose neighbors are in different groups require significantly different emphases on SSL tasks. In this paper, we propose to dynamically learn the weights of SSL tasks for different nodes and fuse the embeddings learned from different SSL tasks to boost performance. We design an innovative graph clustering approach, namely Dynamically Fusing Self-Supervised Learning (DyFSS). Specifically, DyFSS fuses features extracted from diverse SSL tasks using distinct weights derived from a gating network. To effectively learn the gating network, we design a dual-level self-supervised strategy that incorporates pseudo labels and the graph structure. Extensive experiments on five datasets show that DyFSS outperforms the state-of-the-art multi-task SSL methods by up to 8.66% on the accuracy metric. The code of DyFSS is available at: https://github.com/q086/DyFSS.

node representation, representation, ssl task, (12 more...)

2401.06595

Country:

North America > United States > New York > New York County > New York City (0.14)
Asia > China > Tianjin Province > Tianjin (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(13 more...)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.91)

Moreira, Rodrigo, Silva, Flavio de Oliveira, Carvalho, Tereza Cristina Melo de Brito, Martins, Joberto S. B.

Intelligent Data-Driven Architectural Features Orchestration for Network Slicing

architecture, intelligent data-driven architectural feature orchestration, network slicing, (8 more...)

Network slicing is a crucial enabler and a trend for the Next Generation Mobile Network (NGMN) and various other new systems like the Internet of Vehicles (IoV) and Industrial IoT (IIoT). Orchestration and machine learning are key elements with a crucial role in the network-slicing processes since the NS process needs to orchestrate resources and functionalities, and machine learning can potentially optimize the orchestration process. However, existing network-slicing architectures lack the ability to define intelligent approaches to orchestrate features and resources in the slicing process. This paper discusses machine learning-based orchestration of features and capabilities in network slicing architectures. Initially, the slice resource orchestration and allocation in the slicing planning, configuration, commissioning, and operation phases are analyzed. In sequence, we highlight the need for optimized architectural feature orchestration and recommend using ML-embed agents, federated learning intrinsic mechanisms for knowledge acquisition, and a data-driven approach embedded in the network slicing architecture. We further develop an architectural features orchestration case embedded in the SFI2 network slicing architecture. An attack prevention security mechanism is developed for the SFI2 architecture using distributed embedded and cooperating ML agents. The case presented illustrates the architectural feature's orchestration process and benefits, highlighting its importance for the network slicing process.

doi: 10.5281/zenodo.10479965

2401.06538

Country:

Asia > Vietnam > Hanoi > Hanoi (0.06)
South America > Brazil > São Paulo (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Telecommunications (1.00)
Information Technology (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Ivanov, Dmitry, Ben-Porat, Omer

Personalized Reinforcement Learning with a Budget of Policies

Personalization in machine learning (ML) tailors models' decisions to the individual characteristics of users. While this approach has seen success in areas like recommender systems, its expansion into high-stakes fields such as healthcare and autonomous driving is hindered by the extensive regulatory approval processes involved. To address this challenge, we propose a novel framework termed represented Markov Decision Processes (r-MDPs) that is designed to balance the need for personalization with the regulatory constraints. In an r-MDP, we cater to a diverse user population, each with unique preferences, through interaction with a small set of representative policies. Our objective is twofold: efficiently match each user to an appropriate representative policy and simultaneously optimize these policies to maximize overall social welfare. We develop two deep reinforcement learning algorithms that efficiently solve r-MDPs. These algorithms draw inspiration from the principles of classic K-means clustering and are underpinned by robust theoretical foundations. Our empirical investigations, conducted across a variety of simulated environments, showcase the algorithms' ability to facilitate meaningful personalization even under constrained policy budgets. Furthermore, they demonstrate scalability, efficiently adapting to larger policy budgets.

agent, algorithm, welfare, (16 more...)

2401.06514

Country:

North America > United States (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Middle East > Lebanon (0.04)
(5 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Law (1.00)
Health & Medicine (1.00)
Government (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)