Scalable Training of Mixture Models via Coresets

Mar-14-2024, 23:53:37 GMT–Neural Information Processing Systems

How can we train a statistical mixture model on a massive data set? In this paper, we show how to construct coresets for mixtures of Gaussians and natural generalizations. A coreset is a weighted subset of the data, which guarantees that models fitting the coreset will also provide a good fit for the original data set. We show that, perhaps surprisingly, Gaussian mixtures admit coresets of size independent of the size of the data set.

algorithm, coreset, gaussian, (15 more...)

Neural Information Processing Systems

Mar-14-2024, 23:53:37 GMT

Conferences PDF

Add feedback

Country:
- North America > United States
  - California (0.14)
- Europe > Switzerland
  - Zürich > Zürich (0.04)

Technology:
- Information Technology
  - Data Science > Data Mining (0.68)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning > Statistical Learning (0.96)