Collaborating Authors

Fine-Tuning Transformers for NLP


You can see a complete working example in our Colab Notebook, and you can play with the trained models on HuggingFace. Since being first developed and released in the Attention Is All You Need paper Transformers have completely redefined the field of Natural Language Processing (NLP) setting the state-of-the-art on numerous tasks such as question answering, language generation, and named-entity recognition. Here we won't go into too much detail about what a Transformer is, but rather how to apply and train them to help achieve some task at hand. The main things to keep in mind conceptually about Transformers are that they are really good at dealing with sequential data (text, speech, etc.), they act as an encoder-decoder framework where data is mapped to some representational space by the encoder before then being mapped to the output by way of the decoder, and they scale incredibly well to parallel processing hardware (GPUs). Transformers in the field of Natural Language Processing have been trained on massive amounts of text data which allow them to understand both the syntax and semantics of a language very well.

Easy Machine Translation with Machine Learning and HuggingFace Transformers – MachineCurve


Transformers have significantly changed the way in which Natural Language Processing tasks can be performed. This architecture, which trumps the classic recurrent one – and even LSTM-based architectures in some cases, has been around since 2017 and is the process of being democratized today. And in fact, many tasks can use these developments: for example, text summarization, named entity recognition, sentiment analysis – they can all be successfully used with this type of model. In this tutorial, we will be looking at the task of machine translation. We'll first take a look at how Transformers can be used for this purpose, and that they effectively perform a sequence-to-sequence learning task.

8 Leading Language Models For NLP In 2020


The introduction of transfer learning and pretrained language models in natural language processing (NLP) pushed forward the limits of language understanding and generation. Transfer learning and applying transformers to different downstream NLP tasks have become the main trend of the latest research advances. At the same time, there is a controversy in the NLP community regarding the research value of the huge pretrained language models occupying the leaderboards. While lots of AI experts agree with Anna Rogers's statement that getting state-of-the-art results just by using more data and computing power is not research news, other NLP opinion leaders point out some positive moments in the current trend, like, for example, the possibility of seeing the fundamental limitations of the current paradigm. Anyway, the latest improvements in NLP language models seem to be driven not only by the massive boosts in computing capacity but also by the discovery of ingenious ways to lighten models while maintaining high performance.

Question Answering with Python, HuggingFace Transformers and Machine Learning – MachineCurve


If you would like to read about DistilBERT in more detail I'd suggest clicking here for the article, but from what the abstract suggests it was made 60% faster by performing a 40% size reduction while retaining 97% of its language understanding. This is a significant improvement and a great optimization with respect to traditional or'vanilla' BERT. As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can then be fine-tuned with good performances on a wide range of tasks like its larger counterparts. While most prior work investigated the use of distillation for building task-specific models, we leverage knowledge distillation during the pre-training phase and show that it is possible to reduce the size of a BERT model by 40%, while retaining 97% of its language understanding capabilities and being 60% faster.

Augment Your Small Dataset Using Transformers and Synonym Replacement for Sentiment Analysis-- Part…


Once done the output will be passed to the next step on our augmentation pipeline. There are multiple ways to preform data augmentation for NLP. Some techniques are more complex than others and all have pluses and minuses. NLP is by no means an exact science, understanding your domain and your task are pivotal when it comes to augmenting your data. One paper that I found interesting regarding data augmentation was'EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks' by Jason Wei, Kai Zou. In it, Jason and Kai explore how Synonym Replacement (SR), Random Insertion (RI), Random Swap (RS) and Random Deletion (RD) can be light weight and efficient ways of performing data augmentation, when and how they should be implemented and how they perform in NLP tasks compared to other methods.