Adaptively Aligned Image Captioning via Adaptive Attention Time
Lun Huang, Wenmin Wang, Yaxian Xia, Jie Chen
–Neural Information Processing Systems
AATallowstheframeworktolearn howmany attention steps to take to output a caption word at each decoding step. With AAT, an image region can be mapped to an arbitrary number of caption words while a caption word can also attend to an arbitrary number of image regions. AAT is deterministic and differentiable, and doesn't introduce any noise to the parameter gradients.
Neural Information Processing Systems
Feb-15-2026, 10:09:11 GMT