Learning to Coordinate Efficiently: A Model-based Approach
Brafman, R. I., Tennenholtz, M.
–arXiv.org Artificial Intelligence
Pla y ers parti ipating in su h games m ust learn to o ordinate with ea h other in order to re eiv e the highest-p ossible v alue. A n um b er of reinfor emen t learning algorithms ha v e b een prop osed for this problem, and some ha v e b een sho wn to on v erge to go o d solutions in the limit. In this pap er w e sho w that using v ery simple mo del-based algorithms, m u h b etter (i.e., p olynomial) on v ergen e rates an b e attained. Moreo v er, our mo del-based algorithms are guaran teed to on v erge to the optimal v alue, unlik e man y of the existing algorithms. The distributed nature of su h systems mak es the problem of learning to a t in an unkno wn en vironmen t more diÆ ult b e ause the agen ts m ust o ordinate b oth their learning pro ess and their a tion hoi es. Ho w ev er, the need to o ordinate is not restri ted to distributed agen ts, as it arises naturally among self-in terested agen ts in ertain en vironmen ts. A go o d mo del for su h en vironmen ts is that of a ommon-inter est sto hasti game (CISG). A sto hasti game (Shapley, 1953) is a mo del of m ulti-agen t in tera tions onsisting of m ultiple nite or innite stages, in ea h of whi h the agen ts pla y a one-shot strategi form game. The iden tit y of ea h stage dep ends sto hasti ally on the previous stage and the a tions p erformed b y the agen ts in that stage. The goal of ea h agen t is to maximize some fun tion of its rew ard stream - either its a v erage rew ard or its sum of dis oun ted rew ards. A CISG is a sto hasti game in whi h at ea h p oin t the pa y o of all agen ts is iden ti al. V arious algorithms for learning in CISGs ha v e b een prop osed in the literature.
arXiv.org Artificial Intelligence
Jun-26-2011
- Country:
- Asia > Middle East
- Israel (0.04)
- North America > United States (0.04)
- Asia > Middle East
- Genre:
- Research Report (0.50)
- Technology: