Multi-modal Dependency Tree for Video Captioning

Open in new window