Image Captioning with an End to End Transformer Network.


Transformer Networks are deep learning models that learn context and meaning in sequential data by tracking the relationships between the sequences. Since the introduction of Transformer Networks in 2017 by Google Brain in their revolutionary paper "Attention is all you need", transformers have been outperforming conventional neural networks in various problem domains, like Neural Machine Translation, Text Summarization, Language Understanding, and other Natural Language Processing tasks. Along with this, they have also proved to be quite effective in Computer Vision tasks like Image Classification with Vision Transformers and Generative Networks as well. In this article, I will be trying to elaborate on my understanding of the attention mechanism through vision transformers and on sequence to sequence tasks through Transformer Networks. For problems in the Image Domain, like Image Classification and feature extraction from Images, Deep Convolutional Neural Network architectures like ResNet and Inception are used.

Duplicate Docs Excel Report

None found

Similar Docs  Excel Report  more

None found