Data Parallelism and Distributed Deep Learning at production scale (part 2)

Aug-20-2022, 01:44:59 GMT–#artificialintelligence

Lastly, our optimiser is wrapped by Horovod's implementation for distributed optimisation (which handles the all-gather and all-reduce MPI operations). We next assign training callbacks to GPU processors based on the processor's (unique) global rank. By default, rank-0 is designated as the root node. There are some operations we only need executing on a single node (for example, using a model checkpoint to save model weights to file). Each processor will effectively run their own training job which optionally prints training accuracy, loss, and custom metrics to CloudWatch.

dataset, processor, training job, (17 more...)

#artificialintelligence

Aug-20-2022, 01:44:59 GMT

News Web Page

Add feedback

Industry:
- Energy (0.69)
- Information Technology (0.46)

Technology:
- Information Technology
  - Communications (1.00)
  - Artificial Intelligence > Machine Learning
    - Neural Networks > Deep Learning (1.00)