Review for NeurIPS paper: Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning

Neural Information Processing Systems 

Additional Feedback: I really enjoyed this paper, so my comments mostly have to do with making the derivations a bit more readable. The main steps that I got hung up on in reading where the marginalization step, moving from weights beta to weights g, and the step where the matrix A(g) is defined. In both cases, I think some prose description of exactly what the transformation is would be helpful. For the weights g, I think the direct interpretation (the last expression in the line defining g_k(a j) is more intuitive than the definition in terms of beta. It is not obvious how one moves from one to the other (especially with the inverse migrating out of the summation).