AITopics | He, Jingwen

Collaborating Authors

He, Jingwen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models

Fan, Weichen, Si, Chenyang, Song, Junhao, Yang, Zhenyu, He, Yinan, Zhuo, Long, Huang, Ziqi, Dong, Ziyue, He, Jingwen, Pan, Dongwei, Wang, Yi, Jiang, Yuming, Wang, Yaohui, Gao, Peng, Chen, Xinyuan, Li, Hengjie, Lin, Dahua, Qiao, Yu, Liu, Ziwei

arXiv.org Artificial IntelligenceJan-14-2025

We present Vchitect-2.0, a parallel transformer architecture designed to scale up video diffusion models for large-scale text-to-video generation. The overall Vchitect-2.0 system has several key designs. (1) By introducing a novel Multimodal Diffusion Block, our approach achieves consistent alignment between text descriptions and generated video frames, while maintaining temporal coherence across sequences. (2) To overcome memory and computational bottlenecks, we propose a Memory-efficient Training framework that incorporates hybrid parallelism and other memory reduction techniques, enabling efficient training of long video sequences on distributed systems. (3) Additionally, our enhanced data processing pipeline ensures the creation of Vchitect T2V DataVerse, a high-quality million-scale training dataset through rigorous annotation and aesthetic evaluation. Extensive benchmarking demonstrates that Vchitect-2.0 outperforms existing methods in video quality, training efficiency, and scalability, serving as a suitable base for high-fidelity video generation.

artificial intelligence, machine learning, video, (17 more...)

arXiv.org Artificial Intelligence

2501.08453

Country:

Asia > China (1.00)
North America > United States > Massachusetts (0.14)

Genre:

Personal (0.93)
Research Report > Promising Solution (0.46)
Research Report > New Finding (0.46)

Industry: Education > Educational Setting > Higher Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Real-world Video Face Restoration: A New Benchmark

Chen, Ziyan, He, Jingwen, Lin, Xinqi, Qiao, Yu, Dong, Chao

arXiv.org Artificial IntelligenceMay-4-2024

Blind face restoration (BFR) on images has significantly progressed over the last several years, while real-world video face restoration (VFR), which is more challenging for more complex face motions such as moving gaze directions and facial orientations involved, remains unsolved. Typical BFR methods are evaluated on privately synthesized datasets or self-collected real-world low-quality face images, which are limited in their coverage of real-world video frames. In this work, we introduced new real-world datasets named FOS with a taxonomy of "Full, Occluded, and Side" faces from mainly video frames to study the applicability of current methods on videos. Compared with existing test datasets, FOS datasets cover more diverse degradations and involve face samples from more complex scenarios, which helps to revisit current face restoration approaches more comprehensively. Given the established datasets, we benchmarked both the state-of-the-art BFR methods and the video super resolution (VSR) methods to comprehensively study current approaches, identifying their potential and limitations in VFR tasks. In addition, we studied the effectiveness of the commonly used image quality assessment (IQA) metrics and face IQA (FIQA) metrics by leveraging a subjective user study. With extensive experimental results and detailed analysis provided, we gained insights from the successes and failures of both current BFR and VSR methods. These results also pose challenges to current face restoration approaches, which we hope stimulate future advances in VFR research.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2404.195

Country: Asia > China (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback