Parallel bootstrap-based on-policy deep reinforcement learning for continuous flow control applications