Learning to Coordinate Efficiently: A Model-based Approach

Brafman, R. I., Tennenholtz, M.

arXiv.org Artificial Intelligence 

Pla y ers parti ipating in su h games m ust learn to o ordinate with ea h other in order to re eiv e the highest-p ossible v alue. A n um b er of reinfor emen t learning algorithms ha v e b een prop osed for this problem, and some ha v e b een sho wn to on v erge to go o d solutions in the limit. In this pap er w e sho w that using v ery simple mo del-based algorithms, m u h b etter (i.e., p olynomial) on v ergen e rates an b e attained. Moreo v er, our mo del-based algorithms are guaran teed to on v erge to the optimal v alue, unlik e man y of the existing algorithms. The distributed nature of su h systems mak es the problem of learning to a t in an unkno wn en vironmen t more diÆ ult b e ause the agen ts m ust o ordinate b oth their learning pro ess and their a tion hoi es. Ho w ev er, the need to o ordinate is not restri ted to distributed agen ts, as it arises naturally among self-in terested agen ts in ertain en vironmen ts. A go o d mo del for su h en vironmen ts is that of a ommon-inter est sto hasti game (CISG). A sto hasti game (Shapley, 1953) is a mo del of m ulti-agen t in tera tions onsisting of m ultiple nite or innite stages, in ea h of whi h the agen ts pla y a one-shot strategi form game. The iden tit y of ea h stage dep ends sto hasti ally on the previous stage and the a tions p erformed b y the agen ts in that stage. The goal of ea h agen t is to maximize some fun tion of its rew ard stream - either its a v erage rew ard or its sum of dis oun ted rew ards. A CISG is a sto hasti game in whi h at ea h p oin t the pa y o of all agen ts is iden ti al. V arious algorithms for learning in CISGs ha v e b een prop osed in the literature.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found