Large Scale Distributed Deep Networks

Mar-14-2024, 09:50:55 GMT–Neural Information Processing Systems

Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS.

downpour sgd, model replica, neural network, (13 more...)

Neural Information Processing Systems

Mar-14-2024, 09:50:55 GMT

Conferences PDF

Add feedback

Country:
- North America
  - United States > California
    - Santa Clara County > Mountain View (0.04)
  - Canada > Ontario
    - Toronto (0.04)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (1.00)
  - Machine Learning
    - Statistical Learning (1.00)
    - Neural Networks > Deep Learning (1.00)