multidimensional scaling and data clustering
Multidimensional Scaling and Data Clustering
Visualizing and structuring pairwise dissimilarity data are difficult combinatorial op(cid:173) timization problems known as multidimensional scaling or pairwise data clustering. Algorithms for embedding dissimilarity data set in a Euclidian space, for clustering these data and for actively selecting data to support the clustering process are discussed in the maximum entropy framework. Active data selection provides a strategy to discover structure in a data set efficiently with partially unknown data.
Multidimensional Scaling and Data Clustering
Hofmann, Thomas, Buhmann, Joachim
Visualizing and structuring pairwise dissimilarity data are difficult combinatorial optimization problems known as multidimensional scaling or pairwise data clustering. Algorithms for embedding dissimilarity data set in a Euclidian space, for clustering these data and for actively selecting data to support the clustering process are discussed in the maximum entropy framework. Active data selection provides a strategy to discover structure in a data set efficiently with partially unknown data. 1 Introduction Grouping experimental data into compact clusters arises as a data analysis problem in psychology, linguistics, genetics and other experimental sciences. The data which are supposed to be clustered are either given by an explicit coordinate representation (central clustering) or, in the non-metric case, they are characterized by dissimilarity values for pairs of data points (pairwise clustering). In this paper we study algorithms (i) for embedding non-metric data in a D-dimensional Euclidian space, (ii) for simultaneous clustering and embedding of non-metric data, and (iii) for active data selection to determine a particular cluster structure with minimal number of data queries. All algorithms are derived from the maximum entropy principle (Hertz et al., 1991) which guarantees robust statistics (Tikochinsky et al., 1984).
Multidimensional Scaling and Data Clustering
Hofmann, Thomas, Buhmann, Joachim
Visualizing and structuring pairwise dissimilarity data are difficult combinatorial optimization problems known as multidimensional scaling or pairwise data clustering. Algorithms for embedding dissimilarity data set in a Euclidian space, for clustering these data and for actively selecting data to support the clustering process are discussed in the maximum entropy framework. Active data selection provides a strategy to discover structure in a data set efficiently with partially unknown data. 1 Introduction Grouping experimental data into compact clusters arises as a data analysis problem in psychology, linguistics, genetics and other experimental sciences. The data which are supposed to be clustered are either given by an explicit coordinate representation (central clustering) or, in the non-metric case, they are characterized by dissimilarity values for pairs of data points (pairwise clustering). In this paper we study algorithms (i) for embedding non-metric data in a D-dimensional Euclidian space, (ii) for simultaneous clustering and embedding of non-metric data, and (iii) for active data selection to determine a particular cluster structure with minimal number of data queries. All algorithms are derived from the maximum entropy principle (Hertz et al., 1991) which guarantees robust statistics (Tikochinsky et al., 1984).
Multidimensional Scaling and Data Clustering
Hofmann, Thomas, Buhmann, Joachim
Visualizing and structuring pairwise dissimilarity data are difficult combinatorial optimization problemsknown as multidimensional scaling or pairwise data clustering. Algorithms for embedding dissimilarity data set in a Euclidian space, for clustering these data and for actively selecting data to support the clustering process are discussed in the maximum entropy framework. Active data selection provides a strategy to discover structure in a data set efficiently with partially unknown data. 1 Introduction Grouping experimental data into compact clusters arises as a data analysis problem in psychology, linguistics,genetics and other experimental sciences. The data which are supposed to be clustered are either given by an explicit coordinate representation (central clustering) or, in the non-metric case, they are characterized by dissimilarity values for pairs of data points (pairwise clustering). In this paper we study algorithms (i) for embedding non-metric data in a D-dimensional Euclidian space, (ii) for simultaneous clustering and embedding of non-metric data, and (iii) for active data selection to determine a particular cluster structure with minimal number of data queries. All algorithms are derived from the maximum entropy principle (Hertz et al., 1991) which guarantees robust statistics (Tikochinsky et al., 1984).