Accelerating Parallel Stochastic Gradient Descent via Non-blocking Mini-batches