Distributed Newton Can Communicate Less and Resist Byzantine Workers
–Neural Information Processing Systems
We develop a distributed second order optimization algorithm that is communication-efficient as well as robust against Byzantine failures of the worker machines. We propose an iterative approximate Newton-type algorithm, where the worker machines communicate \emph{only once} per iteration with the central machine. This is in sharp contrast with the state-of-the-art distributed second order algorithms like GIANT \cite{giant}, DINGO\cite{dingo}, where the worker machines send (functions of) local gradient and Hessian sequentially; thus ending up communicating twice with the central machine per iteration. Furthermore, we employ a simple norm based thresholding rule to filter-out the Byzantine worker machines. We establish the linear-quadratic rate of convergence of our proposed algorithm and establish that the communication savings and Byzantine resilience attributes only correspond to a small statistical error rate for arbitrary convex loss functions.
Neural Information Processing Systems
Oct-11-2024, 10:29:06 GMT
- Technology: