Policy Optimization and Multi-agent Reinforcement Learning for Mean-variance Team Stochastic Games