Multidimensional Scaling and Data Clustering
Hofmann, Thomas, Buhmann, Joachim
–Neural Information Processing Systems
Visualizing and structuring pairwise dissimilarity data are difficult combinatorial optimization problemsknown as multidimensional scaling or pairwise data clustering. Algorithms for embedding dissimilarity data set in a Euclidian space, for clustering these data and for actively selecting data to support the clustering process are discussed in the maximum entropy framework. Active data selection provides a strategy to discover structure in a data set efficiently with partially unknown data. 1 Introduction Grouping experimental data into compact clusters arises as a data analysis problem in psychology, linguistics,genetics and other experimental sciences. The data which are supposed to be clustered are either given by an explicit coordinate representation (central clustering) or, in the non-metric case, they are characterized by dissimilarity values for pairs of data points (pairwise clustering). In this paper we study algorithms (i) for embedding non-metric data in a D-dimensional Euclidian space, (ii) for simultaneous clustering and embedding of non-metric data, and (iii) for active data selection to determine a particular cluster structure with minimal number of data queries. All algorithms are derived from the maximum entropy principle (Hertz et al., 1991) which guarantees robust statistics (Tikochinsky et al., 1984).
Neural Information Processing Systems
Dec-31-1995