Reviews: Propagating Uncertainty in Reinforcement Learning via Wasserstein Barycenters

Neural Information Processing Systems 

This paper proposes a mechanism for maintaining distributions over Q-values (called Q-posteriors) by defining the value function (the V-posterior) to be a Wasserstein barycenter of Q-posteriors and defining the TD update to be a Wasserstein barycenter of the current Q-posterior with an estimated posterior based on the value function. These distributions are intended to represent uncertainty about the Q-function and they enable more nuanced definitions of the "optimal" (w.r.t. Contributions seem to be: 1. A means of propagating uncertainty about Q-values via Wasserstein barycenters (Equations 2 & 3). 2. A proof that a modified version of the proposed algorithm is PAC-MDP in the average loss setting (Theorems 5.1 and 5.2). The paper is fairly clearly written and easy enough to understand. 2. The idea of propagating uncertainty via Wasserstein barycenters is interesting and suggests several concrete realizations.