ZeroLM: Data-Free Transformer Architecture Search for Language Models
Chen, Zhen-Song, Ding, Hong-Wei, Wang, Xian-Jia, Pedrycz, Witold
–arXiv.org Artificial Intelligence
Neural architecture search (NAS) provides a systematic framework for automating the design of neural network architectures, yet its widespread adoption is hindered by prohibitive computational requirements. Existing zero-cost proxy methods, while reducing search overhead, demonstrate inadequate performance in architecture ranking tasks, particularly for Transformer-based models where they often underperform simple parameter counting metrics. Current automated proxy discovery approaches suffer from extended search times, susceptibility to data overfitting, and structural complexity. This paper introduces a novel zero-cost proxy methodology that quantifies model capacity through efficient weight statistics computation while decomposing Transformer architectures into functionally distinct sub-modules, thereby optimizing the balance of their contributions to overall performance. Our comprehensive evaluation demonstrates the superiority of this approach, achieving a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark. The proposed method exhibits exceptional computational efficiency while maintaining robust performance across diverse NAS benchmark tasks, offering a practical solution for large-scale architecture search.
arXiv.org Artificial Intelligence
Mar-24-2025
- Country:
- North America > Canada
- Europe > Middle East
- Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Asia
- Macao (0.14)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- China > Hubei Province
- Wuhan (0.05)
- Genre:
- Research Report > New Finding (0.67)
- Technology: