Towards Automatic Clustering Analysis using Traces of Information Gain: The InfoGuide Method
Rocha, Paulo, Pinheiro, Diego, Cadeiras, Martin, Bastos-Filho, Carmelo
Clustering analysis has become a ubiquitous information retrieval tool in a wide range of domains, but a more automatic framework is still lacking. Though internal metrics are the key players towards a successful retrieval of clusters, their effectiveness on real-world datasets remains not fully understood, mainly because of their unrealistic assumptions underlying datasets. We hypothesized that capturing {\it traces of information gain} between increasingly complex clustering retrievals---{\it InfoGuide}---enables an automatic clustering analysis with improved clustering retrievals. We validated the {\it InfoGuide} hypothesis by capturing the traces of information gain using the Kolmogorov-Smirnov statistic and comparing the clusters retrieved by {\it InfoGuide} against those retrieved by other commonly used internal metrics in artificially-generated, benchmarks, and real-world datasets. Our results suggested that {\it InfoGuide} can enable a more automatic clustering analysis and may be more suitable for retrieving clusters in real-world datasets displaying nontrivial statistical properties.
Jan-23-2020
- Country:
- North America > United States
- California
- Alameda County > Berkeley (0.04)
- Yolo County > Davis (0.04)
- New Jersey > Camden County
- Jackson (0.04)
- California
- South America > Brazil
- Pernambuco (0.04)
- North America > United States
- Genre:
- Research Report
- Experimental Study (0.48)
- New Finding (0.68)
- Research Report
- Industry:
- Technology: