Image Captioning Using Hugging Face Vision Encoder Decoder -- Step 2 Step Guide (Part 2)

Jul-18-2022, 18:20:31 GMT–#artificialintelligence

In the previous article, we discussed in brief about encoder decoders and our approach towards solving the task of captioning. We fine-tuned a language model, this allowed the decoder to learn new words, generate brief captions and save training time. This can be referred as priming our decoder before actual training on the captioning task. Before we dirty our hands with the code, let us understand how the Vision Encoder Decoder module connects the two models (Image Encoder and Text Sequence generator) and how it deciphers what is present in the image. To understand this, you need a basic understanding of how transformer attention works and terminologies like KEY, QUERY & VALUE.

artificial intelligence, machine learning, natural language, (11 more...)

#artificialintelligence

Jul-18-2022, 18:20:31 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Vision (0.51)
  - Natural Language (0.37)
  - Machine Learning (0.37)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found