Near-OptimalReinforcementLearningwithSelf-Play