LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models

Jan-18-2025, 02:41:21 GMT–Neural Information Processing Systems

The Transformer architecture is ubiquitously used as the building block of largescale autoregressive language models. However, finding architectures with the optimal trade-off between task performance (perplexity) and hardware constraints like peak memory utilization and latency is non-trivial. This is exacerbated by the proliferation of various hardware. We leverage the somewhat surprising empirical observation that the number of decoder parameters in autoregressive Transformers has a high rank correlation with task performance, irrespective of the architecture topology. This observation organically induces a simple Neural Architecture Search (NAS) algorithm that uses decoder parameters as a proxy for perplexity without need for any model training.

artificial intelligence, machine learning, natural language, (13 more...)

Neural Information Processing Systems

Jan-18-2025, 02:41:21 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.96)
  - Natural Language (1.00)
  - Systems & Languages > Problem-Independent Architectures (0.63)