Goto

Collaborating Authors

 diffusion process







MomentDiff: Generative Video Moment Retrieval from Random to Real

Neural Information Processing Systems

To achieve this goal, we provide a generative diffusion-based framework called MomentDiff, which simulates a typical human retrieval process from random browsing to gradual localization. Specifically, we first diffuse the real span to random noise, and learn to denoise the random noise to the original span with the guidance of similarity between text and video.