1e6057620ed314b0020b3a30284b0f83-Paper-Datasets_and_Benchmarks_Track.pdf
–Neural Information Processing Systems
Specifically, through clustering, we first identify 1,291 user-focused topics from the million-scale real text-to-video prompt dataset, VidProM. Then, we use these topics to retrieve videos from YouTube, split the retrieved videos into clips, the clips and with generate specified both brief topics, and we detailed are left captions with about for each 1.09 clip. million After video verifying clips. Our experiments reveal that (1) current 16 text-to-video models do not achieve consistent performance across all user-focused topics; and (2) a simple model trained on VideoUFO outperforms others on worst-performing topics. The dataset and code are publicly available here and here under the CCBY 4.0 License.
Neural Information Processing Systems
Jun-15-2026, 12:18:42 GMT
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Law (0.46)
- Information Technology (0.46)
- Health & Medicine (0.46)
- Technology:
- Information Technology
- Communications > Social Media (1.00)
- Artificial Intelligence
- Machine Learning > Neural Networks (0.94)
- Natural Language > Large Language Model (0.69)
- Vision (0.69)
- Information Technology