Appendix) F (w

Mar-27-2025, 14:57:24 GMT–Neural Information Processing Systems

We implement pipeline between data downloading and data ingestion to accelerate the training. After completing the computation of gradients, the worker would directly send the gradient with the token back to the PS in a non-blocking way. In this way, the fast workers would ingest much more data than the straggling workers. When a worker recovered from a failure, it would drop the previous state (e.g., data in the batch buffer and token) and proceed to deal with the new batch. The disappearance of a specific token would not change the correctness and efficiency of GBA.

artificial intelligence, gradient, machine learning, (15 more...)

Neural Information Processing Systems

Mar-27-2025, 14:57:24 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.46)