How to train your deep learning models in a distributed fashion.
Deep learning algorithms are well suited for large data sets and also training deep learning networks needs large computation power. With GPUs / TPUs easily available on pay per use basis or for free (like Google collab), it is possible today to train a large neural network on cloud-like say Resnet 152 (152 layers) on ImageNet database which has around 14 million images. But is a multi-core GPU-enabled machine just enough to train huge models. Technically yes, but it might take weeks to train the model. So how do we reduce the training time?
May-27-2021, 15:06:58 GMT
- Technology: