Learning Individual Policies in Large Multi-agent Systems through Local Variance Minimization