End-to-endMulti-modalVideoTemporalGrounding

Neural Information Processing Systems 

To integrate the three modalities more effectively and enable inter-modal learning, we design a dynamic fusion scheme with transformers to model the interactions between modalities.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found