Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model
Ariu, Kaito, Proutiere, Alexandre, Yun, Se-Young
–arXiv.org Artificial Intelligence
Community detection or clustering refers to the task of gathering similar items into a few groups from the data that, most often, correspond to observations of pair-wise interactions between items Newman and Girvan [2004]. A benchmark commonly used to assess the performance of clustering algorithms is the celebrated Stochastic Block Model (SBM) Holland et al. [1983], where pair-wise interactions are represented by a random graph. In this graph, the vertices correspond to items, and the presence of an edge between two items indicates their interaction. The SBM has been extensively studied over the last two decades; for a recent survey, see Abbe [2018]. However, it provides a relatively simplistic view of how items may interact. In real applications, interactions can be of different types (e.g., represented by ratings in recommender systems or a level of proximity between users in a social network). To capture this richer information about item interactions, the Labeled Stochastic Block Model (LSBM), proposed and analyzed in Heimlicher et al. [2012], Lelarge et al. [2013], Yun and Proutiere [2016], describes interactions by labels drawn from an arbitrary collection. The objective of this paper is to devise a clustering algorithm that, based on the observation of these labels, reconstructs the clusters of items while minimizing the expected number of misclassified items. In the following, we formally introduce LSBMs and outline our results.
arXiv.org Artificial Intelligence
Jun-18-2023
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- New York > New York County > New York City (0.04)
- Europe > United Kingdom
- Genre:
- Research Report (0.84)
- Technology: