LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models
–Neural Information Processing Systems
The Transformer architecture is ubiquitously used as the building block of large-scale autoregressive language models.
Neural Information Processing Systems
Aug-17-2025, 04:55:42 GMT