End-to-end Multi-modal Video Temporal Grounding Yi-Wen Chen

Neural Information Processing Systems 

To integrate the three modalities more effectively and enable inter-modal learning, we design a dynamic fusion scheme with transformers to model the interactions between modalities.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found