Goto

Collaborating Authors

 hershkowitz


Approximate State Abstraction for Markov Games

arXiv.org Artificial Intelligence

This paper introduces state abstraction for two-player zero-sum Markov games (TZMGs), where the payoffs for the two players are determined by the state representing the environment and their respective actions, with state transitions following Markov decision processes. For example, in games like soccer, the value of actions changes according to the state of play, and thus such games should be described as Markov games. In TZMGs, as the number of states increases, computing equilibria becomes more difficult. Therefore, we consider state abstraction, which reduces the number of states by treating multiple different states as a single state. There is a substantial body of research on finding optimal policies for Markov decision processes using state abstraction. However, in the multi-player setting, the game with state abstraction may yield different equilibrium solutions from those of the ground game. To evaluate the equilibrium solutions of the game with state abstraction, we derived bounds on the duality gap, which represents the distance from the equilibrium solutions of the ground game. Finally, we demonstrate our state abstraction with Markov Soccer, compute equilibrium policies, and examine the results.


Hershkowitz

AAAI Conferences

Massive state spaces are ubiquitous throughout planning and reinforcement learning (RL) domains: agents involved in furniture assembly, cooking automation and backgammon must grapple with problem formalisms that are much too expansive to solve by conventional tabular approaches. However, modern tabular planning and RL techniques bypass this difficulty by using propositional functions to transfer knowledge across states -- both within and across problem instances -- to solve for near optimal behaviors in very large state spaces. Here we present a means by which useful propositional functions can be inferred from observations of transition dynamics. Our approach is based upon distilling salient relational values between pairs of objects. We then use these learned propositional functions to free the RL algorithm deterministic object-oriented RMAX (DOORMAX) of its dependence on expert-provided propositional functions. We also empirically demonstrate high correspondence between these learned propositional functions and expert-provided propositional functions. Our novel DOORMAX algorithm performs at a level near that of classic DOORMAX.