Fast convergence of the Expectation Maximization algorithm under a logarithmic Sobolev inequality

Caprio, Rocco, Johansen, Adam M

arXiv.org Artificial Intelligence 

The Expectation Maximization (EM) algorithm has been a cent ral part of the statistician's toolbox since being formalised by [ 22 ] as an effective general computational solution to the marginal maximum likelihood problem. At that time the algor ithm had been proposed previously in numerous special contexts, including that of empirical Bayes [ 27 ]. Empirical Bayes methods have received considerable attention in the m odern machine learning literature, where they are widely used to specify hyper-paramete rs in high-dimensional models. In recent years there has been a great deal of interest within the Bayesian statistics and machine learning communities in the construction of gradie nt flows, especially Wasserstein gradient flows, which underlie Langevin Monte Carlo algorit hms. Some recent work has focussed on the intersection of empirical Bayes type method s and gradient flow-based algorithms. Our aim is to demonstrate here that some of the tools, particularly those emerging from optimal transport and Wasserstein geometry, which hav e been developed in the context of these modern computational methods provide a natura l approach to the analysis of the EM algorithm itself--and many of its approximations. S uch analysis is quite direct, requires limited further technical work and yields state-o f-the-art conclusions under conditions which are, if anything, weaker than those ordinaril y employed in the quantitative analysis of EM algorithms. 1 In this paper we utilize the connection between EM and a coord inate-wise minimization algorithm applied to the free energy functional identified b y [ 43 ] to provide non-asymptotic error bounds for EM algorithms under an extended form of the l og-Sobolev inequality. To do this, we extend an argument commonly used to understand Eu clidean coordinate descent algorithms by comparison with gradient descent via the desc ent lemma [ 9, 8, 10 ], together with recently developed results for using and understandin g gradients on the product of Euclidean and Wasserstein spaces [ 13 ].