Intuitive Introduction to BERT – MachineCurve
Transformers are taking the world of NLP by storm. After being introduced in Vaswani et al.'s Attention is all you need work back in 2017, they – and particularly their self-attention mechanism requiring no recurrent elements to be used anymore – have proven to show state-of-the-art performance on a wide variety of language tasks. Nevertheless, what's good can still be improved, and this process has been applied to Transformers as well. After the introduction of the'vanilla' Transformer by Vaswani and colleagues, a group of people at OpenAI have used just the decoder segment and built a model that works great. However, according to Devlin et al., the authors of a 2018 paper about pretrained Transformers in NLP, they do one thing wrong: the attention that they apply is unidirectional. This hampers learning unnecessarily, they argue, and they proposed a bidirectional variant instead: BERT, or Bidirectional Encoder Representations from Transformers.
Mar-5-2021, 04:50:06 GMT
- Technology: