Agents
concerns (C
We would like to thank all the reviewers for their constructive feedback. Citations refer to references in the paper and to the additional ones provided below. "I do agree that full information feedback is hard to expect in real scenarios,... However, the current Is there an application where this is a more realistic assumption?" The main motivation for our model is a setting that is in between the full information and bandit feedback. The proposed feedback model is also present in other practical applications.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper proposes a fairer optimization criterion, "regularized maximin", for centralized multi-agent MDPs. The idea, taken from the networking literature is elegant. The authors also propose an iterative optimization method that scales somewhat better than linear programming. The description of the transition model, lines 69-79, seems unnecessarily detailed.
Fairness in Multi-Agent Sequential Decision-Making
We define a fairness solution criterion for multi-agent decision-making problems, where agents have local interests. This new criterion aims to maximize the worst performance of agents with a consideration on the overall performance. We develop a simple linear programming approach and a more scalable game-theoretic approach for computing an optimal fairness policy. This game-theoretic approach formulates this fairness optimization as a two-player zero-sum game and employs an iterative algorithm for finding a Nash equilibrium, corresponding to an optimal fairness policy.