GenURL: A General Framework for Unsupervised Representation Learning

Li, Siyuan, Zang, Zelin, Wu, Di, Chen, Zhiyuan, Li, Stan Z.

arXiv.org Artificial Intelligence 

Recently unsupervised representation learning (URL) has achieved remarkable progress in various scenarios. However, most methods are specifically designed based on specific data characters or task assumptions. Based on the manifold assumption, we regard most URL problems as an embedding problem that seeks an optimal low-dimensional representation of the given high-dimensional data. We split the embedding process into two steps, data structural modeling and low-dimensional embedding, and propose a general similarity-based framework called GenURL. Specifically, we provide a general method to model data structures by adaptively combining graph distances on the feature space and predefined graphs, then propose robust loss functions to learn the low-dimensional embedding. Combining with a specific pretext task, we can adapt GenURL to various URL tasks in a unified manner and achieve state-of-the-art performance, including self-supervised visual representation learning, unsupervised knowledge distillation, graph embeddings, and dimension reduction. Moreover, ablation studies of loss functions and basic hyper-parameter settings in GenURL illustrate the data characters of various tasks.