Sample-Efficient Reinforcement Learning of Partially Observable Markov Games