Hybrid Approach to Parallel Stochastic Gradient Descent

Vora, Aakash Sudhirbhai, Joshi, Dhrumil Chetankumar, Patel, Aksh Kantibhai

Jun-27-2024–arXiv.org Artificial Intelligence

Stochastic Gradient Descent is used for large datasets to train models to reduce the training time. On top of that data parallelism is widely used as a method to efficiently train neural networks using multiple worker nodes in parallel. Synchronous and asynchronous approach to data parallelism is used by most systems to train the model in parallel. However, both of them have their drawbacks. We propose a third approach to data parallelism which is a hybrid between synchronous and asynchronous approaches, using both approaches to train the neural network. When the threshold function is selected appropriately to gradually shift all parameter aggregation from asynchronous to synchronous, we show that in a given time period our hybrid approach outperforms both asynchronous and synchronous approaches.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

Jun-27-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - New York > New York County
      - New York City (0.04)
    - Colorado > Broomfield County
      - Broomfield (0.04)
    - Arizona > Maricopa County
      - Tempe (0.05)
  - Canada > Ontario
    - Toronto (0.04)
- Europe > Spain
  - Andalusia > Granada Province > Granada (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (1.00)
  - Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found