probabilistic principal component analysis
On the Consistency of Maximum Likelihood Estimation of Probabilistic Principal Component Analysis
Probabilistic principal component analysis (PPCA) is currently one of the most used statistical tools to reduce the ambient dimension of the data. From multidimensional scaling to the imputation of missing data, PPCA has a broad spectrum of applications ranging from science and engineering to quantitative finance.\Despite
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.31)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.30)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.30)
Estimation and Imputation in Probabilistic Principal Component Analysis with Missing Not At Random Data
Missing Not At Random (MNAR) values where the probability of having missing data may depend on the missing value itself, are notoriously difficult to account for in analyses, although very frequent in the data. One solution to handle MNAR data is to specify a model for the missing data mechanism, which makes inference or imputation tasks more complex. Furthermore, this implies a strong \textit{a priori} on the parametric form of the distribution. However, some works have obtained guarantees on the estimation of parameters in the presence of MNAR data, without specifying the distribution of missing data \citep{mohan2018estimation, tang2003analysis}. This is very useful in practice, but is limited to simple cases such as few self-masked MNAR variables in data generated according to linear regression models.
Review for NeurIPS paper: Estimation and Imputation in Probabilistic Principal Component Analysis with Missing Not At Random Data
Additional Feedback: I have read the other reviews and the authors' feedback. With the addition of the recommender system experiment, walking the readers through how (1) the MNAR and PPCA model apply in this setting, (2) selecting the hyper-parameters for the imputation algorithm, (3) showing how the imputations compare with prior algorithms, helps make a strong case for the proposed method. If the authors re-arrange the paper to improve clarity (as the reviews point out, and as they promise in their feedback), the paper can be substantially stronger. There are a few lingering questions from the reviews that the authors should address in the paper at a minimum -- (1) a discussion on a stage-wise approach to imputation (and why that may not be necessary for their sequence of regressions), (2) given that some of the linear coefficients can be zero, what must a practitioner do when one of the regressions estimate a coefficient close to 0 that is then used in the denominator of other estimates. Even better if the illustration is grounded in an example like movie item ratings.
On the Consistency of Maximum Likelihood Estimation of Probabilistic Principal Component Analysis
Probabilistic principal component analysis (PPCA) is currently one of the most used statistical tools to reduce the ambient dimension of the data. From multidimensional scaling to the imputation of missing data, PPCA has a broad spectrum of applications ranging from science and engineering to quantitative finance.\Despite In fact, it is well known that the maximum likelihood estimation (MLE) can only recover the true model parameters up to a rotation. The main obstruction is posed by the inherent identifiability nature of the PPCA model resulting from the rotational symmetry of the parameterization. To resolve this ambiguity, we propose a novel approach using quotient topological spaces and in particular, we show that the maximum likelihood solution is consistent in an appropriate quotient Euclidean space.
Estimation and Imputation in Probabilistic Principal Component Analysis with Missing Not At Random Data
Missing Not At Random (MNAR) values where the probability of having missing data may depend on the missing value itself, are notoriously difficult to account for in analyses, although very frequent in the data. One solution to handle MNAR data is to specify a model for the missing data mechanism, which makes inference or imputation tasks more complex. Furthermore, this implies a strong \textit{a priori} on the parametric form of the distribution. However, some works have obtained guarantees on the estimation of parameters in the presence of MNAR data, without specifying the distribution of missing data \citep{mohan2018estimation, tang2003analysis}. This is very useful in practice, but is limited to simple cases such as few self-masked MNAR variables in data generated according to linear regression models.
A Dual Formulation for Probabilistic Principal Component Analysis
De Plaen, Henri, Suykens, Johan A. K.
PCA, but rather in another model based on similar In this paper, we characterize Probabilistic Principal principles. Component Analysis in Hilbert spaces and demonstrate how the optimal solution admits a More recently, Restricted Kernel Machines (Suykens, 2017) representation in dual space. This allows us to develop opened a new door for a probabilistic version of PCA both a generative framework for kernel methods. in primal and dual. They essentially use the Fenchel-Young Furthermore, we show how it englobes Kernel inequality on a variational formulation of KPCA (Suykens Principal Component Analysis and illustrate its et al., 2003; Alaíz et al., 2018) to obtain an energy function, working on a toy and a real dataset.
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (3 more...)
On Robust Probabilistic Principal Component Analysis using Multivariate $t$-Distributions
Guo, Yiping, Bondell, Howard D.
Principal Component Analysis (PCA) is a common multivariate statistical analysis method, and Probabilistic Principal Component Analysis (PPCA) is its probabilistic reformulation under the framework of Gaussian latent variable model. To improve the robustness of PPCA, it has been proposed to change the underlying Gaussian distributions to multivariate $t$-distributions. Based on the representation of $t$-distribution as a scale mixture of Gaussians, a hierarchical model is used for implementation. However, although the robust PPCA methods work reasonably well for some simulation studies and real data, the hierarchical model implemented does not yield the equivalent interpretation. In this paper, we present a set of equivalent relationships between those models, and discuss the performance of robust PPCA methods using different multivariate $t$-distributed structures through several simulation studies. In doing so, we clarify a current misrepresentation in the literature, and make connections between a set of hierarchical models for robust PPCA.
Adaptive probabilistic principal component analysis
Farooq, Adam, Raykov, Yordan P., Evers, Luc, Little, Max A.
Using the linear Gaussian latent variable model as a starting point we relax some of the constraints it imposes by deriving a nonparametric latent feature Gaussian variable model. This model introduces additional discrete latent variables to the original structure. The Bayesian nonparametric nature of this new model allows it to adapt complexity as more data is observed and project each data point onto a varying number of subspaces. The linear relationship between the continuous latent and observed variables make the proposed model straightforward to interpret, resembling a locally adaptive probabilistic PCA (A-PPCA). We propose two alternative Gibbs sampling procedures for inference in the new model and demonstrate its applicability on sensor data for passive health monitoring.
- North America > United States (0.14)
- Europe > United Kingdom (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Netherlands (0.04)