MomentDiff: Generative Video Moment Retrieval from Random to Real

Neural Information Processing Systems 

To achieve this goal, we provide a generative diffusion-based framework called MomentDiff, which simulates a typical human retrieval process from random browsing to gradual localization. Specifically, we first diffuse the real span to random noise, and learn to denoise the random noise to the original span with the guidance of similarity between text and video.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found