Cooperative Multi-Agent Policy Gradients with Sub-optimal Demonstration