Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation

Open in new window