Amazon's BERT Optimal Subset: 7.9x Faster & 6.3x Smaller Than BERT

Nov-8-2020, 20:05:04 GMT–#artificialintelligence

The transformer-based architectures BERT has in recent years demonstrated the efficacy of large-scale pretrained models for tackling natural language processing (NLP) tasks such as machine translation and question answering. BERT's large size and complex pretraining process however raise usability concerns for many researchers. In a new paper, a pair of Amazon Alexa researchers extract an optimal subset of architectural parameters for the BERT architecture by applying recent breakthroughs in algorithms for neural architecture search. The proposed optimal subset, "Bort," is just 5.5 percent the effective size of the original BERT-large architecture (not counting the embedding layer), and 16 percent of its net size. Many attempts have been made to extract a simpler sub-architecture of BERT that maintains similar performance to its predecessor while simplifying the pretraining process and shortening inference time. Yet the performance of such sub-architectures is still being surpassed by the original implementation in terms of accuracy, the researchers say, and the choice of the set of architectural parameters in these works often appears to be arbitrary.

architectural parameter, architecture, bert, (8 more...)

#artificialintelligence

Nov-8-2020, 20:05:04 GMT

News Web Page

Add feedback

Country:
- Asia > China (0.08)

Genre:
- Research Report > New Finding (0.73)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks (0.57)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found