Unsupervised Manifold Alignment with Joint Multidimensional Scaling
Chen, Dexiong, Fan, Bowen, Oliver, Carlos, Borgwardt, Karsten
–arXiv.org Artificial Intelligence
We introduce Joint Multidimensional Scaling, a novel approach for unsupervised manifold alignment, which maps datasets from two different domains, without any known correspondences between data instances across the datasets, to a common low-dimensional Euclidean space. Our approach integrates Multidimensional Scaling (MDS) and Wasserstein Procrustes analysis into a joint optimization problem to simultaneously generate isometric embeddings of data and learn correspondences between instances from two different datasets, while only requiring intra-dataset pairwise dissimilarities as input. This unique characteristic makes our approach applicable to datasets without access to the input features, such as solving the inexact graph matching problem. We propose an alternating optimization scheme to solve the problem that can fully benefit from the optimization techniques for MDS and Wasserstein Procrustes. We demonstrate the effectiveness of our approach in several applications, including joint visualization of two datasets, unsupervised heterogeneous domain adaptation, graph matching, and protein structure alignment. Many problems in machine learning require joint visual exploration and manipulation of multiple datasets from different (heterogeneous) domains, which is generally a preferable first step prior to any further data analysis. These different data domains may consist of measurements for the same samples obtained with different methods or technologies, such as single-cell multi-omics data in bioinformatics (Demetci et al., 2022; Liu et al., 2019; Cao & Gao, 2022). Alternatively, the data could be comprised of different datasets of similar objects, such as word spaces of different languages in natural language modeling (Alvarez-Melis et al., 2019; Grave et al., 2019), or graphs representing related objects such as disease-procedure recommendation in biomedicine (Xu et al., 2019b). There are two main challenges in joint exploration of multiple datasets. First, the data from the heterogeneous domains may be high-dimensional or may not possess input features but rather only dissimilarities between them. Second, the correspondences between data instances across different domains may not be known a priori. We propose in this work to tackle both issues simultaneously while making few assumptions on the data modality.
arXiv.org Artificial Intelligence
Feb-16-2023
- Country:
- Europe (0.46)
- North America > United States (0.48)
- Genre:
- Research Report (1.00)
- Industry:
- Technology: