Strategies for Parallelizing the Big-Means Algorithm: A Comprehensive Tutorial for Effective Big Data Clustering
Mussabayev, Ravil, Mussabayev, Rustam
–arXiv.org Artificial Intelligence
This study focuses on the optimization of the Big-means algorithm for clustering large-scale datasets, exploring four distinct parallelization strategies. We conducted extensive experiments to assess the computational efficiency, scalability, and clustering performance of each approach, revealing their benefits and limitations. The paper also delves into the trade-offs between computational efficiency and clustering quality, examining the impacts of various factors. Our insights provide practical guidance on selecting the best parallelization strategy based on available resources and dataset characteristics, contributing to a deeper understanding of parallelization techniques for the Big-means algorithm.
arXiv.org Artificial Intelligence
Nov-23-2023
- Country:
- Europe (0.67)
- North America > United States (0.45)
- Genre:
- Instructional Material (1.00)
- Research Report > New Finding (1.00)
- Industry:
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning > Statistical Learning
- Clustering (1.00)
- Natural Language (0.92)
- Representation & Reasoning > Optimization (0.92)
- Machine Learning > Statistical Learning
- Communications (1.00)
- Data Science > Data Mining (1.00)
- Scientific Computing (0.92)
- Software (0.92)
- Artificial Intelligence
- Information Technology