LearningDistinctandRepresentativeModes forImageCaptioning
–Neural Information Processing Systems
While mode collapse is typically a side effect for generative modeling, it is somewhat "welcomed" in SoTA image captioning models as it usually facilitates a higher evaluation performance on reference-based metrics like CIDEr, BLEU and SPICE.
Neural Information Processing Systems
Feb-8-2026, 11:45:47 GMT