Diverse Image Captioning with Context Object Split Latent Spaces

Oct-2-2025, 11:57:46 GMT–Neural Information Processing Systems

The word dimension for the embedding layer is 300. In Tab. 7 we further evaluate the diversity of COS-CVAE using self-CIDEr We provide additional qualitative results in Tabs. In Tab. 12 we show the divserse captions for novel objects generated by our model and the regions The evaluation server for nocaps accepts only one caption per image and does not support methods modeling one-to-many relationships for images and captions. In Figure 1 (left) we show the average accuracy and diversity scores again averaged across annotators; in Figure 1 (right) we show the accuracy and diversity scores from each annotator. We find that the captions generated by the COS-CV AE are scored to be more accurate compared to COS-CV AE (paired).

artificial intelligence, machine learning, surfboard, (16 more...)

Neural Information Processing Systems

Oct-2-2025, 11:57:46 GMT

Conferences PDF

Add feedback

Industry:
- Leisure & Entertainment > Sports > Snowboard (0.30)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
24bea84d52e6a1f8025e313c2ffff50a-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found