Learning Near-Pareto-Optimal Conventions in Polynomial Time
Wang, Xiaofeng, Sandholm, Tuomas
–Neural Information Processing Systems
We study how to learn to play a Pareto-optimal strict Nash equilibrium when there exist multiple equilibria and agents may have different preferences among the equilibria. We focus on repeated coordination games of non-identical interest where agents do not know the game structure up front and receive noisy payoffs. We design efficient near-optimal algorithms for both the perfect monitoring and the imperfect monitoring setting(where the agents only observe their own payoffs and the joint actions).
Neural Information Processing Systems
Dec-31-2004