Robust and Diverse Multi-Agent Learning via Rational Policy Gradient