Estimating the Rate-Distortion Function by Wasserstein Gradient Descent

Neural Information Processing Systems 

In the theory of lossy compression, the rate-distortion (R-D) function R(D) describes how much a data source can be compressed (in bit-rate) at any given level of fidelity (distortion). Obtaining R(D) for a given data source establishes the fundamental performance limit for all compression algorithms. We propose a new method to estimate R(D) from the perspective of optimal transport. Unlike the classic Blahut--Arimoto algorithm which fixes the support of the reproduction distribution in advance, our Wasserstein gradient descent algorithm learns the support of the optimal reproduction distribution by moving particles. We prove its local convergence and analyze the sample complexity of our R-D estimator based on a connection to entropic optimal transport.