Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model
Huang, Haoyang, Ma, Guoqing, Duan, Nan, Chen, Xing, Wan, Changyi, Ming, Ranchen, Wang, Tianyu, Wang, Bo, Lu, Zhiying, Li, Aojie, Zeng, Xianfang, Zhang, Xinhao, Yu, Gang, Yin, Yuhe, Wu, Qiling, Sun, Wen, An, Kang, Han, Xin, Sun, Deshan, Ji, Wei, Huang, Bizhu, Li, Brian, Wu, Chenfei, Huang, Guanzhe, Xiong, Huixin, He, Jiaxin, Wu, Jianchang, Yuan, Jianlong, Wu, Jie, Liu, Jiashuai, Guo, Junjing, Tan, Kaijun, Chen, Liangyu, Chen, Qiaohui, Sun, Ran, Yuan, Shanshan, Yin, Shengming, Liu, Sitong, Chen, Wei, Dai, Yaqi, Luo, Yuchu, Ge, Zheng, Guan, Zhisheng, Song, Xiaoniu, Zhou, Yu, Jiao, Binxing, Chen, Jiansheng, Li, Jing, Zhou, Shuchang, Zhang, Xiangyu, Xiu, Yi, Zhu, Yibo, Shum, Heung-Yeung, Jiang, Daxin
–arXiv.org Artificial Intelligence
We present Step-Video-TI2V, a state-of-the-art text-driven image-to-video generation model with 30B parameters, capable of generating videos up to 102 frames based on both text and image inputs. We build Step-Video-TI2V-Eval as a new benchmark for the text-driven image-to-video task and compare Step-Video-TI2V with open-source and commercial TI2V engines using this dataset. Experimental results demonstrate the state-of-the-art performance of Step-Video-TI2V in the image-to-video generation task.
arXiv.org Artificial Intelligence
Mar-14-2025