Reviews: Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization

Neural Information Processing Systems 

The article extends previous work of primal-dual optimisation for policy evaluation in RL to the distributed policy evaluation setting, maintaining attractive convergence rates for the extended algorithm. Overall, the article gradually builds its contribution and is reasonably easy to follow. A few exception to this are the start of related work, dropping citations in lists, and the lack of an explanation of the repeatedly mentioned'convex-concave saddle-point problem'. The authors equate averaging over'agents' with averaging over'space', which is somewhat of an imprecise metaphorical stretch in my view. The contribution is honestly delineated (collaborative distributed policy evaluation with local rewards), and relevant related work is cited clearly.