A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models
–Neural Information Processing Systems
The arrival of Sora marks a new era for text-to-video diffusion models, bringing significant advancements in video generation and potential applications. However, Sora, along with other text-to-video diffusion models, is highly reliant on prompts, and there is no publicly available dataset that features a study of text-to-video prompts. In this paper, we introduce VidProM, the first large-scale dataset comprising 1.67 Million unique text-to-Video Prompts from real users. Additionally, this dataset includes 6.69 million videos generated by four state-of-the-art diffusion models, alongside some related data. We initially discuss the curation of this large-scale dataset, a process that is both time-consuming and costly. Subsequently, we underscore the need for a new prompt dataset specifically designed for text-to-video generation by illustrating how VidProM differs from DiffusionDB, a large-scale prompt-gallery dataset for image generation. Our extensive and diverse dataset also opens up many exciting new research areas.
Neural Information Processing Systems
May-30-2025, 05:26:13 GMT
- Country:
- Asia
- Japan > Honshū (0.14)
- Middle East (0.28)
- Asia
- Genre:
- Research Report (0.67)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Law (1.00)
- Technology: