Strategies for Parallelizing the Big-Means Algorithm: A Comprehensive Tutorial for Effective Big Data Clustering

Nov-23-2023–arXiv.org Artificial Intelligence

This study focuses on the optimization of the Big-means algorithm for clustering large-scale datasets, exploring four distinct parallelization strategies. We conducted extensive experiments to assess the computational efficiency, scalability, and clustering performance of each approach, revealing their benefits and limitations. The paper also delves into the trade-offs between computational efficiency and clustering quality, examining the impacts of various factors. Our insights provide practical guidance on selecting the best parallelization strategy based on available resources and dataset characteristics, contributing to a deeper understanding of parallelization techniques for the Big-means algorithm.

algorithm, dataset, std med std, (14 more...)

arXiv.org Artificial Intelligence

Nov-23-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.45)
- Europe
  - Germany (0.04)
  - Denmark > North Jutland (0.04)
  - Russia > Central Federal District
    - Moscow Oblast > Moscow (0.04)
- Asia
  - Russia (0.04)
  - Kazakhstan > Almaty Region
    - Almaty (0.04)

Genre:
- Research Report > New Finding (1.00)
- Instructional Material (1.00)

Industry:
- Information Technology (0.92)
- Government > Regional Government (0.45)
- Health & Medicine > Therapeutic Area
  - Infections and Infectious Diseases (0.45)

Technology:
- Information Technology
  - Data Science > Data Mining (1.00)
  - Communications (1.00)
  - Scientific Computing (0.92)
  - Software (0.92)
  - Artificial Intelligence
    - Representation & Reasoning > Optimization (0.92)
    - Natural Language (0.92)
    - Machine Learning > Statistical Learning
      - Clustering (1.00)