Collaborating Authors

Top books on Transformers in 2022


In 2017, the Google Brain team introduced Transformers in a paper called, Attention is all you need. They proposed a new simple network architecture based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Today, transformers have become the model of choice for NLP and computer vision. The Transformer architecture implements an encoder-decoder structure without recurrence and convolutions. Taking cognizance of the importance of Transformers in machine learning and AI, we have listed five books to understand the sequence transduction model better.

Understanding Google's Switch Transformer


When GPT-3 was introduced by OpenAI in May 2020 the news spread like wildfire. Not only amongst the AI community but even within the mainstream media there were headlines like "A robot wrote this article" and "Have you read something written by GPT-3?". Before GPT-3, the largest language model was Turing-NLG with 17 billion parameters, released in February 2020. Later that year, OpenAI blew this out the park with 175 billion parameters. Suddenly, there was a language model that could produce content that was often indistinguishable from humans.

Transformer in Transformer Artificial Intelligence

Transformer is a type of self-attention-based neural networks originally applied for NLP tasks. Recently, pure transformer-based models are proposed to solve computer vision problems. These visual transformers usually view an image as a sequence of patches while they ignore the intrinsic structure information inside each patch. In this paper, we propose a novel Transformer-iN-Transformer (TNT) model for modeling both patch-level and pixel-level representation. In each TNT block, an outer transformer block is utilized to process patch embeddings, and an inner transformer block extracts local features from pixel embeddings. The pixel-level feature is projected to the space of patch embedding by a linear transformation layer and then added into the patch. By stacking the TNT blocks, we build the TNT model for image recognition. Experiments on ImageNet benchmark and downstream tasks demonstrate the superiority and efficiency of the proposed TNT architecture. For example, our TNT achieves $81.3\%$ top-1 accuracy on ImageNet which is $1.5\%$ higher than that of DeiT with similar computational cost. The code will be available at

Easy Machine Translation with Machine Learning and HuggingFace Transformers – MachineCurve


Transformers have significantly changed the way in which Natural Language Processing tasks can be performed. This architecture, which trumps the classic recurrent one – and even LSTM-based architectures in some cases, has been around since 2017 and is the process of being democratized today. And in fact, many tasks can use these developments: for example, text summarization, named entity recognition, sentiment analysis – they can all be successfully used with this type of model. In this tutorial, we will be looking at the task of machine translation. We'll first take a look at how Transformers can be used for this purpose, and that they effectively perform a sequence-to-sequence learning task.

Facebook Open Sources a Chatbot That Can Discuss Any Topic - KDnuggets


I recently started a new newsletter focus on AI education and already has over 50,000 subscribers. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Last year, Facebook AI Research(FAIR) open sourced BlenderBot 1.0, the largest open domain chatbot ever built. BlenderBot is able to engage in a large variety of conversations across nearly any topic while displaying human-like characteristics such as empathy and personable levels of engagement.