Image Captioning Using Hugging Face Vision Encoder Decoder -- Step 2 Step Guide (Part 2)

#artificialintelligence 

In the previous article, we discussed in brief about encoder decoders and our approach towards solving the task of captioning. We fine-tuned a language model, this allowed the decoder to learn new words, generate brief captions and save training time. This can be referred as priming our decoder before actual training on the captioning task. Before we dirty our hands with the code, let us understand how the Vision Encoder Decoder module connects the two models (Image Encoder and Text Sequence generator) and how it deciphers what is present in the image. To understand this, you need a basic understanding of how transformer attention works and terminologies like KEY, QUERY & VALUE.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found