Coarse-Refinement Dilemma: On Generalization Bounds for Data Clustering
Vaz, Yule, de Mello, Rodrigo Fernandes, Grossi, Carlos Henrique
This paper is organized as follows: Section 2 briefly introduces some studies related to the formalization of theoretical frameworks in the context of the Data Clustering (DC) problem; Section 3 introduces a general formulation for the DC and HC problems; Section 4 discusses the Coarse-Refinement Dilemma considering the homology group H 0; Section 5 shows that homology groups of degree greater than zero are affected by overrefined and over-coarsed topologies; Section 6 compares our proposed generalization bounds to Carlsson and M emoli [12]'s consistency; finally, conclusions and future directions are provided in Section 8. 2. Related work Data Clustering (DC) faces many challenges in defining and guaranteeing generalization from datasets, as it does not rely on labels and, consequently, it cannot take advantage of computing any evident error measurement such as risk [7]. While studying this issue, Kleinberg [8] considered that a clustering model is an application of a mapping f on top of a distance function d: I I R, given I contains indices of data points in some fixed-size set S, disregarding its ambient space though [25]. From this initial setup, Kleinberg [8] defined three properties to be respected in order to assess clustering algorithms and models: - Scale-invariance: Given a distance and a clustering function, d and f, and a scalar α, the following must hold f (d) f (αd). Thus, the similarity representation over S must be consistent with the units of measurement; - Consistency: Let Γ be a partition of S and d,d null two distance functions. Function d null is referred to as a Γ transformation of d if: (i) for all i,j S belonging to the same cluster, d null (i,j) d( i,j); and (ii) for all i,j S belonging to different clusters, d null (i,j) d( i,j). Consistency holds if f (d null) f ( d) whenever d null is a Σ transformation of d.
Nov-13-2019
- Country:
- South America > Brazil (0.04)
- North America > United States
- District of Columbia > Washington (0.04)
- New York > New York County
- New York City (0.04)
- New Jersey > Mercer County
- Princeton (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Europe
- Switzerland (0.04)
- Netherlands (0.04)
- Asia > Japan
- Genre:
- Research Report (0.81)
- Technology: