Identifiability-Guaranteed Simplex-Structured Post-Nonlinear Mixture Learning via Autoencoder
Unsupervised mixture learning (UML) aims at unraveling the aggregated and entangled underlying latent components from ambient data, without using any training samples. This task is also known as blind source separation (BSS) and factor analysis in the literature [1]. UML has a long history in the signal processing and machine learning communities; see, e.g., the early seminal work of independent component analysis (ICA) [1]. Many important applications can be considered as a UML problem, e.g., audio/speech separation [2], EEG signal denoising [3], image representation learning [4], hyperspectral unmixing [5], and topic mining [6], just to name a few. One of the arguably most important aspects in UML/BSS is the so-called identifiability problem--is it possible to identify the mixed latent components from the mixtures in an unsupervised manner? The UML problem is often ill-posed, since an arbitrary number of solutions exist in general; see, e.g., discussions in [1, 7]. To establish identifiability, one may exploit prior knowledge of the mixing process and/or the latent components. Various frameworks were proposed for unraveling linearly mixed latent components by exploiting their properties, e.g., statistical independence, nonnegativity, boundedness, sparsity, and simplex structure--which leads to many well-known unsupervised learning models, i.e., ICA [1], nonnegative matrix factorization (NMF) [7], bounded component analysis (BCA) [8], sparse component analysis (SCA) [9], and simplex-structured matrix factorization (SSMF) [2, 6, 10]. These structures often stem from physical meaning of their respective engineering problems.
Jun-16-2021