VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation

Jun-11-2026, 00:33:14 GMT–Neural Information Processing Systems

Text-to-video generative models convert textual prompts into dynamic visual content, offering wide-ranging applications in film production, gaming, and education. However, their real-world performance often falls short of user expectations. One key reason is that these models have not been trained on videos related to some topics users want to create. In this paper, we propose VideoUFO, the first Video dataset specifically curated to align with Users' FOcus in real-world scenarios. Beyond this, our VideoUFO also features: (1) minimal (0.29\%) overlap with existing video datasets, and (2) videos searched exclusively via YouTube's official API under the Creative Commons license.

artificial intelligence, name change, proceedings, (8 more...)

Neural Information Processing Systems

Jun-11-2026, 00:33:14 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.39)