Provable Memory Efficient Self-Play Algorithm for Model-free Reinforcement Learning