Communication-Efficient Distributed Estimator for Generalized Linear Models with a Diverging Number of Covariates

Zhou, Ping, Yu, Zhen, Ma, Jingyi, Tian, Maozai

arXiv.org Machine Learning 

Distributed statistical inference has recently attracted immense attention. Herein, we study the asymptotic efficiency of the maximum likelihood estimator (MLE), the one-step MLE, and the aggregated estimating equation estimator for generalized linear models with a diverging number of covariates. Then a novel method is proposed to obtain an asymptotically efficient estimator for large-scale distributed data by two rounds of communication between local machines and the central server. The assumption on the number of machines in this paper is more relaxed and thus practical for real-world applications. Simulations and a case study demonstrate the satisfactory finite-sample performance of the proposed estimators. Keywords: Generalized linear models, Large-scale distributed data, Asymptotic efficiency, One-step MLE, Diverging p MSC: 62J12 1 . Introduction In modern times, large-scale data sets have become increasingly common, and they are often stored across multiple machines. Since communication cost between machines is considerably higher than the cost of conducting statistical analysis on a single machine (Jaggi et al., 2014; Smith et al., 2018), it is inefficient to calculate a global estimator by the transmission of the local data to a central machine. Further, the application of the traditional iterative algorithms in a distributed system, such as the Fisher-scoring algorithm for maximum likelihood estimator (MLE) in generalized linear models (GLMs), cannot avoid multiple rounds of communication that incurs exorbitant costs. Therefore, communication-efficient distributed algorithms must be developed to accommodate the new features of modern data sets.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found