Fast and Accurate k-llleans For Large Datasets Alex Wong School of EECS Department of Computer Science Oregon State University

Mar-15-2024, 02:42:45 GMT–Neural Information Processing Systems

Clustering is a popular problem with many applications. We consider the k-means problem in the situation where the data is too large to be stored in main memory and must be accessed sequentially, such as from a disk, and where we must use as little memory as possible. Our algorithm is based on recent theoretical results, with significant improvements to make it practical. Our approach greatly simplifies a recently developed algorithm, both in design and in analysis, and eliminates large constant factors in the approximation guarantee, the memory requirements, and the running time. We then incorporate approximate nearest neighbor search to compute k-means in o( nk) (where n is the number of data points; note that computing the cost, given a solution, takes 8(nk) time). We show that our algorithm compares favorably to existing algorithms - both theoretically and experimentally, thus providing state-of-the-art performance in both theory and practice.

algorithm, approximation, approximation factor, (15 more...)

Neural Information Processing Systems

Mar-15-2024, 02:42:45 GMT

Conferences PDF

Add feedback

Country:
- South America > Paraguay
  - Asunción > Asunción (0.04)
- North America > United States
  - Oregon (0.40)
  - California
    - Santa Clara County > Mountain View (0.04)
    - Los Angeles County > Los Angeles (0.04)
- Asia > Afghanistan
  - Parwan Province > Charikar (0.04)

Genre:
- Research Report (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Statistical Learning
    - Clustering (0.69)