Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation

Neural Information Processing Systems 

Image captioning aims to describe visual content in natural language. As'a picture is worth a thousand words', there could be various correct descriptions for an image. However, with maximum likelihood estimation as the training objective, the captioning model is penalized whenever its prediction mismatches with the label.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found