Adversary-resilient Inference and Machine Learning: From Distributed to Decentralized

Yang, Zhixiong, Gang, Arpita, Bajwa, Waheed U.

arXiv.org Machine Learning 

Statistical inference and machine learning algorithms have traditionally been developed for data available at a single location. Unlike this centralized setting, modern datasets are increasingly being distributed across multiple physical entities (sensors, devices, machines, data centers, etc.) for a multitude of reasons that range from storage, memory, and computational constraints to privacy concerns and engineering needs. This has necessitated the development of inference and learning algorithms capable of operating on non-collocated data. Such algorithms can be divided into two broad categories, namely, distributed algorithms and decentralized algorithms . Distributed algorithms correspond to the setup in which the data-bearing entities (henceforth referred to as "nodes") only communicate with a single entity (referred to as master node, central server, parameter server, fusion center, etc.), which is tasked with generating the final result. Such distributed setups arise in the context of parallel computing, where the focus is computational speedups and/or overcoming memory/storage bottlenecks, and federated systems, where "raw" data collected by individual nodes cannot be shared with the master node due to either communication constraints (e.g., sensor networks) or privacy concerns (e.g., smartphone data). Decentralized algorithms, on the other hand, correspond to the setup that lacks a central server; instead, individual nodes in this setup communicate among themselves over a network (often ad hoc) to reach a common solution (i.e., achieve consensus) at all nodes. Such decentralized setups arise either out of the need to eliminate single points of failure in distributed setups or due to practical constraints, as in the internet of things and autonomous systems. We refer the reader to Figure 1 for examples of distributed and decentralized setups.Is it distributed or is it decentralized? Inference and learning from non-collocated data have been studied for decades in computer science, control, signal processing, and statistics. Both among and within these disciplines, however, there is no consensus on use of the terms "distributed" and "decentralized." Though many works share the definitions provided in here, there are numerous authors who use these two terms interchangeably, while there are some other authors who reverse these definitions. Inference and machine learning algorithms involving non-collocated data are broadly divisible into the categories of ( i) distributed algorithms and ( ii) decentralized algorithms.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found