Accelerating I/O bound deep learning on shared storage
When training a neural network, one typically strives to make the GPU the bottleneck. All data should be read from disk, pre-processed, and transferred to the GPU fast enough so that the GPU is busy 100% of the time computing the next improved version of the model. An increasing trend we see at RiseML is that pre-processing and especially reading the training data from disk becomes the bottleneck. This is caused by multiple factors, including faster GPUs, more efficient model architectures, and larger datasets, especially for video and image processing. As a result, the GPUs sit idle a lot of time, waiting for the next batch of data to work on.
Feb-16-2018, 19:56:35 GMT