Sparse GEMINI for Joint Discriminative Clustering and Feature Selection
Ohl, Louis, Mattei, Pierre-Alexandre, Bouveyron, Charles, Leclercq, Mickaël, Droit, Arnaud, Precioso, Frédéric
–arXiv.org Artificial Intelligence
Feature selection in clustering is a hard task which involves simultaneously the discovery of relevant clusters as well as relevant variables with respect to these clusters. While feature selection algorithms are often model-based through optimised model selection or strong assumptions on $p(\pmb{x})$, we introduce a discriminative clustering model trying to maximise a geometry-aware generalisation of the mutual information called GEMINI with a simple $\ell_1$ penalty: the Sparse GEMINI. This algorithm avoids the burden of combinatorial feature subset exploration and is easily scalable to high-dimensional data and large amounts of samples while only designing a clustering model $p_\theta(y|\pmb{x})$. We demonstrate the performances of Sparse GEMINI on synthetic datasets as well as large-scale datasets. Our results show that Sparse GEMINI is a competitive algorithm and has the ability to select relevant subsets of variables with respect to the clustering without using relevance criteria or prior hypotheses.
arXiv.org Artificial Intelligence
Feb-7-2023
- Country:
- Asia > Armenia (0.04)
- North America
- El Salvador (0.04)
- Canada > Quebec (0.04)
- Europe > France
- Provence-Alpes-Côte d'Azur (0.04)
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Health & Medicine > Therapeutic Area (0.47)
- Technology: