Goto

Collaborating Authors

 ccpca


A Proofs

Neural Information Processing Systems

A.2 Proof of proposition 1 Let Pœ{B,D,E}, k be a valid kernel (assumptions of theorem 1) with K Inversion of conditional with Bayes rule gives: 'W œS As a complement, we now explicit the simple forms taken by the posterior limit graph in each case. A.3 Proof of theorem 2 We consider the following hierarchical model, for Nonetheless it can be simplified as we now show. We focus on finding the optimal eigenvectors first. Only the left term in (18) depends on R. The optimization problem for eigenvectors writes: min tr! Note that the identity permutation i.e. for i œ [n], (i) =i is optimal in this case as the ( We will choose this U in what follows as the sign of the axes do not influence the characterization of the final result in Z as a PCA embedding. Note that this solution is not unique if there are repeated eigenvalues.


APr o of s A.1 Proof of theorem 1 W œS

Neural Information Processing Systems

B . 15 E -Prior In this case the prior reads: P A.3 Proof of theorem 2 We consider the following hierarchical model, for The above problem is non-convex because of the rank constraint (17). We focus on finding the optimal eigenvectors first. Note that this solution is not unique if there are repeated eigenvalues. A.4 Proof of Corollary 1 With the presented hierarchical model (Figure 1), the coupling problem is the following: min Note that the solution does not depend on Á . Figure 3: Graphical representation of the hierarchical model considered in section 4.2.


A Probabilistic Graph Coupling View of Dimension Reduction

Van Assel, Hugues, Espinasse, Thibault, Chiquet, Julien, Picard, Franck

arXiv.org Machine Learning

Most popular dimension reduction (DR) methods like t-SNE and UMAP are based on minimizing a cost between input and latent pairwise similarities. Though widely used, these approaches lack clear probabilistic foundations to enable a full understanding of their properties and limitations. To that extent, we introduce a unifying statistical framework based on the coupling of hidden graphs using cross-entropy. These graphs induce a Markov random field dependency structure among the observations in both input and latent spaces. We show that existing pairwise similarity DR methods can be retrieved from our framework with particular choices of priors for the graphs. Moreover, this reveals that these methods relying on shift-invariant kernels suffer from a statistical degeneracy that explains poor performances in conserving coarse-grain dependencies. New links are drawn with PCA which appears as a non-degenerate graph coupling model.