Understanding Data Parallelism in Machine Learning – Telesens

@machinelearnbot 

Data parallelism is a popular technique used to speed up training on large mini-batches when each mini-batch is too large to fit on a GPU. Under data parallelism, a mini-batch is split up into smaller sized batches that are small enough to fit on the memory available on different GPUs on the network. Each GPU holds an identical copy of the network parameters and runs the forward and backward pass. At the end of the backward pass, each GPU sends the computed gradients to a parameter server. The parameter server aggregates the gradients and computes the updates to the network parameters using some variant of Stochastic Gradient Descent.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found