Communication-Efficient l_0 Penalized Least Square
–arXiv.org Artificial Intelligence
In this paper, we propose a communication-efficient penalized regression algorithm for high-dimensional sparse linear regression models with massive data. This approach incorporates an optimized distributed system communication algorithm, named CESDAR algorithm, based on the Enhanced Support Detection and Root finding algorithm. The CESDAR algorithm leverages data distributed across multiple machines to compute and update the active set and introduces the communication-efficient surrogate likelihood framework to approximate the optimal solution for the full sample on the active set, resulting in the avoidance of raw data transmission, which enhances privacy and data security, while significantly improving algorithm execution speed and substantially reducing communication costs. Notably, this approach achieves the same statistical accuracy as the global estimator. Furthermore, this paper explores an extended version of CESDAR and an adaptive version of CESDAR to enhance algorithmic speed and optimize parameter selection, respectively. Simulations and real data benchmarks experiments demonstrate the efficiency and accuracy of the CESDAR algorithm. Introduction The rapid development of data collection techniques has led to unprecedented growth and expansion in both the volume and dimensionality of data. The massive high-dimensional datasets entail high computational costs and memory constraints. Numerous methods have been utilized for variable selection and parameter estimation in the research domain, including LASSO [1], adaptive LASSO [2], the smoothly clipped absolute deviation (SCAD) penalty [3], the minimax concave penalty (MCP) [4] and so on.
arXiv.org Artificial Intelligence
Apr-1-2025
- Country:
- Asia
- Middle East > Jordan (0.04)
- China > Chongqing Province
- Chongqing (0.04)
- Asia
- Genre:
- Research Report (1.00)
- Overview (0.67)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: