Bench to Time lapseVideoGeneration

Feb-9-2026, 15:56:51 GMT–Neural Information Processing Systems

The emergence of large-scale text-to-image models [92, 60, 59, 58, 42, 5, 94, 14, 54, 40] has significantly advanced the field of Text-to-Video (T2V) generation [66,6,7,21,73,90]. Existing T2V architectures can be categorized into two types: U-Net-based and DiT-based. The latter focuses on recreating open-source structures similar to Sora [9], using the DiT (Diffusion-Transformer) [57]frameworkforT2Vgeneration [43,95,93,20]. When calculating theMTScore, thevideo retrievalmodel uses these texts toevaluate each frame ofthe video, assigning probabilities based on the matches. The final result is obtained by summing the general probability and the metamorphic probability.

artificial intelligence, machine learning, video, (16 more...)

Neural Information Processing Systems

Feb-9-2026, 15:56:51 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.04)
- Asia > Singapore (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Duplicate Docs Excel Report

Title
25b9960c8a5bd887eb5476c951260403-Supplemental-Datasets_and_Benchmarks_Track.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found