Deep Multimodal Subspace Clustering Networks
Abavisani, Mahdi, Patel, Vishal M.
Abstract--We present convolutional neural network (CNN) based approaches for unsupervised multimodal subspace clustering. The proposed framework consists of three main stages - multimodal encoder, self-expressive layer, and multimodal decoder . The encoder takes multimodal data as input and fuses them to a latent space representation. We investigate early, late and intermediate fusion techniques and propose three different encoders corresponding to them for spatial fusion. The self-expressive layers and multimodal decoders are essentially the same for different spatial fusion-based approaches. In addition to various spatial fusion-based methods, an affinity fusion-based network is also proposed in which the self-expressiveness layer corresponding to different modalities is enforced to be the same. Extensive experiments on three datasets show that the proposed methods significantly outperform the state-of-the-art multimodal subspace clustering methods. ANY practical applications in image processing, computer vision, and speech processing require one to process very high-dimensional data. However, these data often lie in a low-dimensional subspace. For instance, facial images with variation in illumination [1], handwritten digits [2] and trajectories of a rigidly moving object in a video [3] are examples where the high-dimensional data can be represented by low-dimensional subspaces. Subspace clustering algorithms essentially use this fact to find clusters in different subspaces within a dataset [4].
Apr-17-2018
- Country:
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- New Jersey > Middlesex County > Piscataway (0.04)
- Asia > Middle East
- Genre:
- Research Report (0.82)
- Industry:
- Health & Medicine
- Diagnostic Medicine > Imaging (0.46)
- Therapeutic Area > Neurology (0.46)
- Health & Medicine
- Technology: