Scaling Up Deep Clustering Methods Beyond ImageNet-1K

Adaloglou, Nikolas, Michels, Felix, Senft, Kaspar, Petrusheva, Diana, Kollmann, Markus

Jun-3-2024–arXiv.org Artificial Intelligence

Deep image clustering methods are typically evaluated on small-scale balanced classification datasets while feature-based $k$-means has been applied on proprietary billion-scale datasets. In this work, we explore the performance of feature-based deep clustering approaches on large-scale benchmarks whilst disentangling the impact of the following data-related factors: i) class imbalance, ii) class granularity, iii) easy-to-recognize classes, and iv) the ability to capture multiple classes. Consequently, we develop multiple new benchmarks based on ImageNet21K. Our experimental analysis reveals that feature-based $k$-means is often unfairly evaluated on balanced datasets. However, deep clustering methods outperform $k$-means across most large-scale benchmarks. Interestingly, $k$-means underperforms on easy-to-classify benchmarks by large margins. The performance gap, however, diminishes on the highest data regimes such as ImageNet21K. Finally, we find that non-primary cluster predictions capture meaningful classes (i.e. coarser classes).

benchmark, imagenet-1k, imagenet21k, (15 more...)

arXiv.org Artificial Intelligence

Jun-3-2024

arXiv.org PDF

Add feedback

Country:
- Europe
  - Slovenia > Drava
    - Municipality of Benedikt > Benedikt (0.04)
  - Germany > North Rhine-Westphalia
    - Düsseldorf Region > Düsseldorf (0.05)
- Asia > Middle East
  - Israel > Tel Aviv District > Tel Aviv (0.04)

Genre:
- Research Report (1.00)

Industry:
- Leisure & Entertainment > Sports (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Clustering (1.00)
  - Neural Networks > Deep Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found