Escaping Saddle Points in Distributed Newton's Method with Communication efficiency and Byzantine Resilience

Ghosh, Avishek, Maity, Raj Kumar, Mazumdar, Arya, Ramchandran, Kannan

arXiv.org Machine Learning 

Motivated by the real-world applications such as recommendation systems, image recognition, and conversational AI, it has become crucial to implement learning algorithms in a distributed fashion. In a commonly used framework, namely data-parallelism, large data-sets are distributed among several worker machines for parallel processing. In many applications, like Federated Learning [KMRR16], data is stored in user devices such as mobile phones and personal computers, and in these applications, fully utilizing the on-device machine intelligence is an important direction for next-generation distributed learning. In a standard distributed framework, several worker machines store data, perform local computations and communicate to the center machine (a parameter server), and the center machine aggregates the local information from worker machines and broadcasts updated parameters iteratively. In this setting, it is well-known that one of the major challenges is to tackle the behavior of the Byzantine machines [LSP82]. This can happen owing to software or hardware crashes, poor communication link between the worker and the center machine, stalled computations, and even co-ordinated or malicious attacks by a third party. In this setup, it is generally assumed (see [YCKB18, BMGS17] that a subset of worker machines behave completely arbitrarily--even in a way that depends on the algorithm used and the data on the other machines, thereby capturing the unpredictable nature of the errors.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found