A debiased distributed estimation for sparse partially linear models in diverging dimensions

Lv, Shaogao, Lian, Heng

arXiv.org Machine Learning 

Under a big-data setting, the storage and analysis of data can no longer be performed on a single machine, and in this case dividing data into many sub-samples becomes a critical 1 procedure for any numerical algorithm to be implemented. Distributed statistical estimation and distributed optimization have received increasing attention in recent years, and a flurry of research towards solving very large scale problems have emerged recently, such as Mcdonald et al. (2009); Zhang et al. (2013, 2015); Rosenblatt et al. (2016) and the references therein. In general, distributed algorithm can be classified into two families: data parallelism and task parallelism. Data parallelism aims at distributing the data across different parallel computing nodes or machines; and task parallelism distributes different tasks across parallel computing nodes. We are only concerned with data parallelism in this paper. In particular, we primarily consider the distributed estimation for partially linear models via using the standard divide and conquer strategy. Divide-and-conquer technology is a simple and communication-efficient way for handling big data, which is commonly used in the literature of statistical learning. To be precise, the whole data is randomly allocated among m machines, a local estimator is computed independently on each machine, and then the central node averages the local solutions into a global estimate. Partially linear models (PLM) (Hardle and Liang, 2007; Heckman, 1986), as the leading example of semiparametric models, are a class of important tools for modeling complex data, which retain model interpretation and flexibility simultaneously.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found