Multi-agent cooperation through learning-aware policy gradients