Normalized mutual information is a biased measure for classification and community detection
Jerdee, Maximilian, Kirkley, Alec, Newman, M. E. J.
Normalized mutual information is widely used as a similarity measure for evaluating the performance of clustering and classification algorithms. In this paper, we show that results returned by the normalized mutual information are biased for two reasons: first, because they ignore the information content of the contingency table and, second, because their symmetric normalization introduces spurious dependence on algorithm output. We introduce a modified version of the mutual information that remedies both of these shortcomings. As a practical demonstration of the importance of using an unbiased measure, we perform extensive numerical tests on a basket of popular algorithms for network community detection and show that one's conclusions about which algorithm is best are significantly affected by the biases in the traditional mutual information.
Jul-3-2023
- Country:
- Africa > Cameroon
- Gulf of Guinea (0.04)
- Asia > China
- Hong Kong (0.05)
- Europe
- Netherlands > South Holland
- Leiden (0.05)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Netherlands > South Holland
- North America > United States
- Michigan > Washtenaw County
- Ann Arbor (0.14)
- New York (0.04)
- Michigan > Washtenaw County
- Africa > Cameroon
- Genre:
- Research Report (1.00)
- Technology: