Goto

Collaborating Authors

How Transformers work in deep learning and NLP: an intuitive introduction

#artificialintelligence

The famous paper "Attention is all you need" in 2017 changed the way we were thinking about attention. Nonetheless, 2020 was definitely the year of transformers! From natural language now they are into computer vision tasks. How did we go from attention to self-attention? Why does the transformer work so damn well? What are the critical components for its success? Read on and find out! In my opinion, transformers are not so hard to grasp.


Image Captioning with an End to End Transformer Network.

#artificialintelligence

Transformer Networks are deep learning models that learn context and meaning in sequential data by tracking the relationships between the sequences. Since the introduction of Transformer Networks in 2017 by Google Brain in their revolutionary paper "Attention is all you need", transformers have been outperforming conventional neural networks in various problem domains, like Neural Machine Translation, Text Summarization, Language Understanding, and other Natural Language Processing tasks. Along with this, they have also proved to be quite effective in Computer Vision tasks like Image Classification with Vision Transformers and Generative Networks as well. In this article, I will be trying to elaborate on my understanding of the attention mechanism through vision transformers and on sequence to sequence tasks through Transformer Networks. For problems in the Image Domain, like Image Classification and feature extraction from Images, Deep Convolutional Neural Network architectures like ResNet and Inception are used.


Natural Language Processing: the age of Transformers

#artificialintelligence

This article is the first installment of a two-post series on Building a machine reading comprehension system using the latest advances in deep learning for NLP. Stay tuned for the second part, where we'll introduce a pre-trained model called BERT that will take your NLP projects to the next level! In the recent past, if you specialized in natural language processing (NLP), there may have been times when you felt a little jealous of your colleagues working in computer vision. It seemed as if they had all the fun: the annual ImageNet classification challenge, Neural Style Transfer, Generative Adversarial Networks, to name a few. At last, the dry spell is over, and the NLP revolution is well underway!


Essential Guide to Transformer Models in Machine Learning

#artificialintelligence

Transformer models have become the defacto standard for NLP tasks. As an example, I'm sure you've already seen the awesome GPT3 Transformer demos and articles detailing how much time and money it took to train. But even outside of NLP, you can also find transformers in the fields of computer vision and music generation. That said, for such a useful model, transformers are still very difficult to understand. It took me multiple readings of the Google research paper first introducing transformers, and a host of blog posts to really understand how transformers work. I'll try to keep the jargon and the technicality to a minimum, but do keep in mind that this topic is complicated. I'll also include some basic math and try to keep things light to ensure the long journey is fun. Q: Why should I understand Transformers? In the past, the state of the art approach to language modeling problems (put simply, predicting the next word) and translations systems was the LSTM and GRU architecture (explained here) along with the attention mechanism.


Transformers

#artificialintelligence

Transformer models have become the go-to model in most of the NLP tasks. Many transformer-based models like BERT, ROBERTa, GPT series, etc are considered as the state-of-the-art models in NLP. While NLP is overwhelming with all these models, Transformers are gaining popularity in Computer vision also. Transformers are now used for recognizing and constructing images, image encoding, and many more. While transformer models are taking over the AI field, it is also important to have a low-level understanding of these models.