Deep Compression of Pre-trained Transformer Models

Neural Information Processing Systems 

Due to their excellent computational efficiency and scalability, transformer models can be trained on exceedingly large amounts of data at the expense of tremendous growth in model size.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found