Diverse Image Captioning with Context Object Split Latent Spaces
–Neural Information Processing Systems
The word dimension for the embedding layer is 300. In Tab. 7 we further evaluate the diversity of COS-CVAE using self-CIDEr We provide additional qualitative results in Tabs. In Tab. 12 we show the divserse captions for novel objects generated by our model and the regions The evaluation server for nocaps accepts only one caption per image and does not support methods modeling one-to-many relationships for images and captions. In Figure 1 (left) we show the average accuracy and diversity scores again averaged across annotators; in Figure 1 (right) we show the accuracy and diversity scores from each annotator. We find that the captions generated by the COS-CV AE are scored to be more accurate compared to COS-CV AE (paired).
Neural Information Processing Systems
Oct-2-2025, 11:57:46 GMT
- Country:
- Europe > Germany
- Hesse > Darmstadt Region > Darmstadt (0.05)
- North America > Canada (0.04)
- Europe > Germany
- Industry:
- Leisure & Entertainment > Sports > Snowboard (0.30)
- Technology: