Out-of-Sample Embedding with Proximity Data: Projection versus Restricted Reconstruction
Trosset, Michael W., Tan, Kaiyi, Tang, Minh, Priebe, Carey E.
The problem of using proximity (similarity or dissimilarity) data for the purpose of "adding a point to a vector diagram" was first studied by J.C. Gower in 1968. Since then, a number of methods -- mostly kernel methods -- have been proposed for solving what has come to be called the problem of *out-of-sample embedding*. We survey the various kernel methods that we have encountered and show that each can be derived from one or the other of two competing strategies: *projection* or *restricted reconstruction*. Projection can be analogized to a well-known formula for adding a point to a principal component analysis. Restricted reconstruction poses a different challenge: how to best approximate redoing the entire multivariate analysis while holding fixed the vector diagram that was previously obtained. This strategy results in a nonlinear optimization problem that can be simplified to a unidimensional search. Various circumstances may warrant either projection or restricted reconstruction.
May-13-2025
- Country:
- Oceania > New Zealand (0.04)
- North America
- Canada (0.04)
- United States
- Indiana (0.04)
- North Carolina (0.04)
- Illinois (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Genre:
- Research Report (0.40)
- Technology: