Bandit Learning in Concave N-Person Games
Mario Bravo, David Leslie, Panayotis Mertikopoulos
–Neural Information Processing Systems
The bane of decision-making in an unknown environment isregret: noone wants to realize in hindsight that the decision policytheyemployed was strictly inferior toaplain policyprescribing the same action throughout. For obvious reasons, this issue becomes considerably more intricate when the decision-makerissubject tosituational uncertainty and the "fog ofwar": when the only information at the optimizer's disposal is the reward obtained from a given action (the so-called "bandit" framework), is it even possible to design a no-regret policy?
Neural Information Processing Systems
Feb-12-2026, 17:47:03 GMT
- Country:
- Europe > France (0.04)
- North America
- Canada > Quebec
- Montreal (0.04)
- United States (0.04)
- Canada > Quebec
- Technology:
- Information Technology > Game Theory (1.00)