1e6057620ed314b0020b3a30284b0f83-Paper-Datasets_and_Benchmarks_Track.pdf

Jun-15-2026, 12:18:42 GMT–Neural Information Processing Systems

Specifically, through clustering, we first identify 1,291 user-focused topics from the million-scale real text-to-video prompt dataset, VidProM. Then, we use these topics to retrieve videos from YouTube, split the retrieved videos into clips, the clips and with generate specified both brief topics, and we detailed are left captions with about for each 1.09 clip. million After video verifying clips. Our experiments reveal that (1) current 16 text-to-video models do not achieve consistent performance across all user-focused topics; and (2) a simple model trained on VideoUFO outperforms others on worst-performing topics. The dataset and code are publicly available here and here under the CCBY 4.0 License.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Jun-15-2026, 12:18:42 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > Experimental Study (1.00)

Industry:
- Law (0.46)
- Information Technology (0.46)
- Health & Medicine (0.46)

Technology:
- Information Technology
  - Communications > Social Media (1.00)
  - Artificial Intelligence
    - Machine Learning > Neural Networks (0.94)
    - Natural Language > Large Language Model (0.69)
    - Vision (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found