Multi-modalDependencyTreeforVideoCaptioning
–Neural Information Processing Systems
Manyexisting methods generate captions using sequence models that predict words in a left-to-right order.
Neural Information Processing Systems
Feb-8-2026, 04:54:02 GMT