Reviews: Adaptively Aligned Image Captioning via Adaptive Attention Time
–Neural Information Processing Systems
Although the two techniques have been well explored individually, this is the first work combining it for attention for image captioning. This should make reproducing the results easier. The base attention model already is doing much better than up-down attention and recent methods like GCN-LSTM and so it's not clear where the gains are coming from. It'd be good to see AAT applied to traditional single-head attention instead of multi-head attention to convincingly show that AAT helps. For instance, how does the attention time steps vary with word position in the caption?
Neural Information Processing Systems
Feb-5-2025, 23:46:49 GMT
- Technology:
- Information Technology > Artificial Intelligence > Vision (0.74)