Review for NeurIPS paper: Learning to Incentivize Other Learning Agents

Neural Information Processing Systems 

Weaknesses: I have two concerns on (1) baselines and (2) scalability. IA is a good one and it is nice to see that LIO outperforms IA, but I do think the results can be more convincing if more benchmark algorithms can be included. Mutual information can be also viewed as an approximation of accounting other agents' future policy change and has shown great performances in harvest/cleanup with a large number of agents. Can we simply learn a value function conditioned on the received rewards of different agents (in the same spirit of DDPG) so that we can avoid performing second-order gradient? These are the questions raised when I read the paper and I believe a more in-depth discussion/experiments will further consolidate the contribution of this work.