Goto

Collaborating Authors

 Yang, Zhixiong


Adversary-resilient Inference and Machine Learning: From Distributed to Decentralized

arXiv.org Machine Learning

Statistical inference and machine learning algorithms have traditionally been developed for data available at a single location. Unlike this centralized setting, modern datasets are increasingly being distributed across multiple physical entities (sensors, devices, machines, data centers, etc.) for a multitude of reasons that range from storage, memory, and computational constraints to privacy concerns and engineering needs. This has necessitated the development of inference and learning algorithms capable of operating on non-collocated data. Such algorithms can be divided into two broad categories, namely, distributed algorithms and decentralized algorithms . Distributed algorithms correspond to the setup in which the data-bearing entities (henceforth referred to as "nodes") only communicate with a single entity (referred to as master node, central server, parameter server, fusion center, etc.), which is tasked with generating the final result. Such distributed setups arise in the context of parallel computing, where the focus is computational speedups and/or overcoming memory/storage bottlenecks, and federated systems, where "raw" data collected by individual nodes cannot be shared with the master node due to either communication constraints (e.g., sensor networks) or privacy concerns (e.g., smartphone data). Decentralized algorithms, on the other hand, correspond to the setup that lacks a central server; instead, individual nodes in this setup communicate among themselves over a network (often ad hoc) to reach a common solution (i.e., achieve consensus) at all nodes. Such decentralized setups arise either out of the need to eliminate single points of failure in distributed setups or due to practical constraints, as in the internet of things and autonomous systems. We refer the reader to Figure 1 for examples of distributed and decentralized setups.Is it distributed or is it decentralized? Inference and learning from non-collocated data have been studied for decades in computer science, control, signal processing, and statistics. Both among and within these disciplines, however, there is no consensus on use of the terms "distributed" and "decentralized." Though many works share the definitions provided in here, there are numerous authors who use these two terms interchangeably, while there are some other authors who reverse these definitions. Inference and machine learning algorithms involving non-collocated data are broadly divisible into the categories of ( i) distributed algorithms and ( ii) decentralized algorithms.


BRIDGE: Byzantine-resilient Decentralized Gradient Descent

arXiv.org Machine Learning

Decentralized optimization techniques are increasingly being used to learn machine learning models from data distributed over multiple locations without gathering the data at any one location. Unfortunately, methods that are designed for faultless networks typically fail in the presence of node failures. In particular, Byzantine failures---corresponding to the scenario in which faulty/compromised nodes are allowed to arbitrarily deviate from an agreed-upon protocol---are the hardest to safeguard against in decentralized settings. This paper introduces a Byzantine-resilient decentralized gradient descent (BRIDGE) method for decentralized learning that, when compared to existing works, is more efficient and scalable in higher-dimensional settings and that is deployable in networks having topologies that go beyond the star topology. The main contributions of this work include theoretical analysis of BRIDGE for strongly convex learning objectives and numerical experiments demonstrating the efficacy of BRIDGE for both convex and nonconvex learning tasks.


ByRDiE: Byzantine-resilient distributed coordinate descent for decentralized learning

arXiv.org Machine Learning

Distributed machine learning algorithms enable processing of datasets that are distributed over a network without gathering the data at a centralized location. While efficient distributed algorithms have been developed under the assumption of faultless networks, failures that can render these algorithms nonfunctional indeed happen in the real world. This paper focuses on the problem of Byzantine failures, which are the hardest to safeguard against in distributed algorithms. While Byzantine fault tolerance has a rich history, existing work does not translate into efficient and practical algorithms for high-dimensional distributed learning tasks. In this paper, two variants of an algorithm termed Byzantine-resilient distributed coordinate descent (ByRDiE) are developed and analyzed that solve distributed learning problems in the presence of Byzantine failures. Theoretical analysis as well as numerical experiments presented in the paper highlight the usefulness of ByRDiE for high-dimensional distributed learning in the presence of Byzantine failures.