Normalized mutual information (NMI) is a widely used measure to compare community detection methods. Recently, however, the need of adjustment for information theory-based measures has been argued because of the so-called selection bias problem, that is, they show the tendency in choosing clustering solutions with more communities. In this article, an experimental evaluation of these measures is performed to deeply investigate the problem, and an adjustment that scales the values of these measures is proposed. Experiments on synthetic networks, for which the ground-truth division is known, highlight that scaled NMI does not present the selection bias behavior. Moreover, a comparison among some well-known community detection methods on synthetic generated networks shows a fairer behavior of scaled NMI, especially when the network topology does not present a clear community structure. The experimentation also on two real-world networks reveals that the corrected formula allows to choose, among a set, the method finding a network division that better reflects the ground-truth structure.
Correction for Closeness: Adjusting Normalized Mutual Information Measure for Clustering Comparison
Pizzuti C
2017
Abstract
Normalized mutual information (NMI) is a widely used measure to compare community detection methods. Recently, however, the need of adjustment for information theory-based measures has been argued because of the so-called selection bias problem, that is, they show the tendency in choosing clustering solutions with more communities. In this article, an experimental evaluation of these measures is performed to deeply investigate the problem, and an adjustment that scales the values of these measures is proposed. Experiments on synthetic networks, for which the ground-truth division is known, highlight that scaled NMI does not present the selection bias behavior. Moreover, a comparison among some well-known community detection methods on synthetic generated networks shows a fairer behavior of scaled NMI, especially when the network topology does not present a clear community structure. The experimentation also on two real-world networks reveals that the corrected formula allows to choose, among a set, the method finding a network division that better reflects the ground-truth structure.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.