Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot