Parameter Free Clustering with Cluster Catch Digraphs (Technical Report)
Manukyan, Artür, Ceyhan, Elvan
Clustering is one of the most challenging tasks in machine learning and pattern recognition, and perhaps, discovering the exact number of clusters of an unlabelled data set is the leading one. Many clustering methods find the clusters (or hidden classes) and the number of these clusters simultaneously (Frey and Dueck, 2007; Sajana et al., 2016). Although there exist methods for validating and comparing the quality of a partitioning of a data set, algorithms that provide the (estimated) number of clusters without any input parameter are still appealing. However, such methods or algorithms rely on other parameters viewed as the intensity, i.e. expected number of objects in a unit area. The value of the intensity parameter works as a threshold, and if the local intensity of the data set exceeds the threshold, it may indicate the existence of a possible cluster. However, the choice of such parameters is often a difficult task since different values of such parameters may drastically change the result of the algorithm. We use unsupervised adaptations of a family of vertex random digraphs, namely class cover catch digraphs (CCCDs), that showed relatively good performance in statistical pattern classification (Manukyan and Ceyhan, 2016; Priebe et al., 2003a). Unsupervised versions of CCCDs are called cluster catch digraphs (CCDs) (DeVinney, 2003; Marchette, 2004). Primarily, CCDs use statistics that require an intensity parameter to be specified or estimated.
Dec-26-2019
- Country:
- Asia > China (0.04)
- Europe > Austria
- Vienna (0.14)
- North America
- Canada > Quebec
- Montreal (0.04)
- United States
- New York > New York County
- New York City (0.04)
- New Jersey > Hudson County
- Hoboken (0.04)
- Alabama > Lee County
- Auburn (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Utah > Salt Lake County
- Salt Lake City (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- California > Orange County
- Irvine (0.04)
- Maryland > Baltimore (0.04)
- Florida > Orange County
- Orlando (0.04)
- New York > New York County
- Canada > Quebec
- Genre:
- Research Report (0.81)
- Industry:
- Health & Medicine (0.92)
- Technology: