Yao, Jiaxiong
Nonparametric Factor Analysis and Beyond
Zheng, Yujia, Liu, Yang, Yao, Jiaxiong, Hu, Yingyao, Zhang, Kun
Nearly all identifiability results in unsupervised representation learning inspired by, e.g., independent component analysis, factor analysis, and causal representation learning, rely on assumptions of additive independent noise or noiseless regimes. In contrast, we study the more general case where noise can take arbitrary forms, depend on latent variables, and be non-invertibly entangled within a nonlinear function. We propose a general framework for identifying latent variables in the nonparametric noisy settings. We first show that, under suitable conditions, the generative model is identifiable up to certain submanifold indeterminacies even in the presence of non-negligible noise. Furthermore, under the structural or distributional variability conditions, we prove that latent variables of the general nonlinear models are identifiable up to trivial indeterminacies. Based on the proposed theoretical framework, we have also developed corresponding estimation methods and validated them in various synthetic and real-world settings. Interestingly, our estimate of the true GDP growth from alternative measurements suggests more insightful information on the economies than official reports. We expect our framework to provide new insight into how both researchers and practitioners deal with latent variables in real-world scenarios.
Revealing Unobservables by Deep Learning: Generative Element Extraction Networks (GEEN)
Hu, Yingyao, Liu, Yang, Yao, Jiaxiong
Latent variable models are crucial in scientific research, where a key variable, such as effort, ability, and belief, is unobserved in the sample but needs to be identified. This paper proposes a novel method for estimating realizations of a latent variable $X^*$ in a random sample that contains its multiple measurements. With the key assumption that the measurements are independent conditional on $X^*$, we provide sufficient conditions under which realizations of $X^*$ in the sample are locally unique in a class of deviations, which allows us to identify realizations of $X^*$. To the best of our knowledge, this paper is the first to provide such identification in observation. We then use the Kullback-Leibler distance between the two probability densities with and without the conditional independence as the loss function to train a Generative Element Extraction Networks (GEEN) that maps from the observed measurements to realizations of $X^*$ in the sample. The simulation results imply that this proposed estimator works quite well and the estimated values are highly correlated with realizations of $X^*$. Our estimator can be applied to a large class of latent variable models and we expect it will change how people deal with latent variables.