Slow Kill for Big Data Learning

She, Yiyuan, Shen, Jianhui, Barbu, Adrian

May-2-2023–arXiv.org Artificial Intelligence

Big-data applications often involve a vast number of observations and features, creating new challenges for variable selection and parameter estimation. This paper presents a novel technique called "slow kill," which utilizes nonconvex constrained optimization, adaptive l The fact that the problem size can decrease during the slow kill iterations makes it particularly effective for large-scale variable screening. The interaction between statistics and optimization provides valuable insights into controlling quantiles, stepsize, and shrinkage parameters in order to relax the regularity conditions required to achieve the desired level of statistical accuracy. Experimental results on real and synthetic data show that slow kill outperforms state-of-the-art algorithms in various situations while being computationally efficient for large-scale data. This paper studies how to build a parsimonious and predictive model in big data applications, where both the number of predictors and the number of observations can be extremely large. Over the past decade, there have been significant advancements in statistical theory for the minimizers of the penalized problem (1). However, modern scientists often encounter challenges with big data, making it impractical to obtain globally optimal estimators even when convexity is present. This paper aims to incorporate computational considerations into statistical modeling, resulting in a new big-data learning framework with theoretical guarantees.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

May-2-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report
  - New Finding (0.46)
  - Promising Solution (0.65)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (0.67)
  - Therapeutic Area > Oncology (0.46)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Statistical Learning (1.00)
    - Representation & Reasoning > Optimization (1.00)
  - Data Science > Data Mining
    - Big Data (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found