A Bi-layered Parallel Training Architecture for Large-scale Convolutional Neural Networks