Taming Multi-Agent Reinforcement Learning with Estimator Variance Reduction