Reviews: Bandit Learning in Concave N-Person Games

Oct-7-2024, 09:24:13 GMT–Neural Information Processing Systems

Context: It is a classic result that empirical frequencies of actions for players playing regret minimization algorithms converges to a coarse corr equilibrium. CCEs are not necessarily desirable solution concepts because they sometimes admit irrational behavior. For monotone games, it is known that the empirical frequencies converge converge to nash equilibrium for agents playing FTRL. Recently, Mertikopoulos et al proved that the sequence of plays for FTRL converges to nash for games -- they prove something more general that goes beyond concave potential games, in fact. This work considers that case when each agent can only observe bandit feedback.

bandit learning, concave n-person game, converge, (10 more...)

Neural Information Processing Systems

Oct-7-2024, 09:24:13 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.57)