Sub-optimal Policy Aided Multi-Agent Reinforcement Learning for Flocking Control