Divide-and-conquer methods for big data analysis

Chen, Xueying, Cheng, Jerry Q., Xie, Min-ge

Feb-21-2021–arXiv.org Machine Learning

In the context of big data analysis, the divide-and-conquer methodology refers to a multiple-step process: first splitting a data set into several smaller ones; then analyzing each set separately; finally combining results from each analysis together. This approach is effective in handling large data sets that are unsuitable to be analyzed entirely by a single computer due to limits either from memory storage or computational time. The combined results will provide a statistical inference which is similar to the one from analyzing the entire data set. This article reviews some recently developments of divide-and-conquer methods in a variety of settings, including combining based on parametric, semiparametric and nonparametric models, online sequential updating methods, among others. Theoretical development on the efficiency of the divide-and-conquer methods is discussed. Examples of real-world data analyses are provided in various application areas.

divide-and-conquer approach, estimator, subset, (16 more...)

arXiv.org Machine Learning

Feb-21-2021

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada (0.04)
  - United States
    - Alaska (0.04)
    - New York > New York County
      - New York City (0.04)
    - New Jersey > Middlesex County
      - Piscataway (0.04)
- Asia > Middle East
  - Jordan (0.05)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Government (0.68)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.71)
  - Artificial Intelligence
    - Cognitive Science > Problem Solving (1.00)
    - Representation & Reasoning > Uncertainty
      - Bayesian Inference (0.68)
    - Machine Learning
      - Statistical Learning (0.70)
      - Learning Graphical Models > Directed Networks
        Bayesian Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found