Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels

Görtler, Jochen, Hohman, Fred, Moritz, Dominik, Wongsuphasawat, Kanit, Ren, Donghao, Nair, Rahul, Kirchner, Marc, Patel, Kayur

arXiv.org Artificial Intelligence 

Abstract--The confusion matrix, a ubiquitous visualization for helping people evaluate machine learning models, is a tabular layout that compares predicted class labels against actual class labels over all data instances. We conduct formative research with machine learning practitioners at a large technology company and find that conventional confusion matrices do not support more complex data-structures found in modern-day applications, such as hierarchical and multi-output labels. To express such variations of confusion matrices, we design an algebra that models confusion matrices as probability distributions. 's utility with three case studies that help people better understand model performance and reveal hidden confusions. Machine learning is a complex, iterative design and development practice predicted class labels (synonymously, these can be flipped via a matrix [4, 24], where the goal is to generate a learned model that generalizes transpose). These visualizations are introduced in many machine to unseen data inputs. One critical step is model evaluation, testing learning courses and are simultaneously used in practice to show what and inspecting a model's performance on held-out test sets of data with pairs of classes a model confuses. Succinctly, confusion matrices are known labels. Confusion matrices show a visual proxy A ubiquitous visualization used for model evaluation, particularly for accuracy (e.g., entries on the diagonal of the matrix), which alone for classification models, is the confusion matrix: a tabular layout that has been shown to be insufficient for many evaluations [39]. Furthermore, compares a predicted class label against the actual class label for each the diagonal of a confusion matrix often contains many more class over all data instances.