On a convergence property of a geometrical algorithm for statistical manifolds
Akaho, Shotaro, Hino, Hideitsu, Murata, Noboru
Information geometry is a framework to analyze statistical inference and machine learning[2]. Geometrically, statistical inference and many machine learning algorithms can be regarded as procedures to find a projection to a model subspace from a given data point. In this paper, we focus on an algorithm to find the projection. Since the projection is given by minimizing a divergence, a common approach to finding the projection is a gradient-based method[6]. However, such an approach is not applicable in some cases. For instance, several attempts to extend the information geometrical framework to nonparametric cases[3, 9, 13, 15], where we need to consider a function space or each data is represented as a point process. In such a case, it is difficult to compute the derivative of divergence that is necessary for gradient-based methods, and in some cases, it is difficult to deal with the coordinate explicitly. Takano et al.[15] proposed a geometrical algorithm to find the projection for nonparametric e-mixture distribution, where the model subspace is spanned by several empirical distributions. The algorithm that is derived based on the generalized Pythagorean theorem only depends on the values of divergences.
Sep-27-2019