High-Dimensional Data Classification in Concentric Coordinates

Williams, Alice, Kovalerchuk, Boris

arXiv.org Artificial Intelligence 

Alice Williams Department of Computer Science Central Washington University USA 0009 - 0001 - 6154 - 2407 Boris Kovalerchuk Department of Computer Science Central Washington University USA 0000 - 0002 - 0995 - 9539 Abstract -- The v isualization of multi - dimensional data with interpretable methods remains limited by ca pabilities for both high - dimensional lossless visualizations that do not suffer from occlusion and that are computationally capable by parameterized visualization . This paper proposes a low to high dimensional data supporting framework using lossless C oncentric C oordinates that a re a more compact generalization of Parallel Coordinate s along with former C ircular C oordinates . These are forms of the General Line Coordinate visualizations that can directly support machine learning algorithm visualization and facilitate human inter action . A. Motivation In many domains, accurate and interpretable classification models can be accurately visualized. However, in many other domains, this remains a long - standing and critical roadblock to deploy artificial intelligence and machine learning (AI/ML) models. This is critica l and challenging for high - risk tasks like healthcare diagnostics. Visualization of multidimensional (n - D) data classification is critical for three major reasons: (1) to speed up analysis of prediction accuracy, (2) to interpret/explain classifier predictions, and (3) to improve/modify the prediction model. B. Overview of Existing Methods AI/ ML tasks for high multi - dimensional (n - D) data are commonly approached with black - box deep - learning (DL) methods that inherently lack in i nterpretability and decision explanation. Further relying on explainability after model design as popularly done with either LIME or SHAP [ 7 ]. Moreover, visualization methods used commonly pre process data with dimensional reduction (DR) methods like Principal Component Analysis (PCA), t - Stochastic Neighbor Embedding (t - SNE), or other similar approximations. However, s uch methods are lossy and not reversible. Therefore, these methods commonly introduce visual ly verify inaccuracies in n - D. Alternatively, lossless visualizations allow for the use of Visual Knowledge Discovery (VKD) to visually discover algorithmic adjustments that improve ML prediction models [ 5 ] .