AITopics | Li, Zongjian

Collaborating Authors

Li, Zongjian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Open-Sora Plan: Open-Source Large Video Generation Model

Lin, Bin, Ge, Yunyang, Cheng, Xinhua, Li, Zongjian, Zhu, Bin, Wang, Shaodong, He, Xianyi, Ye, Yang, Yuan, Shenghai, Chen, Liuhan, Jia, Tanghui, Zhang, Junwu, Tang, Zhenyu, Pang, Yatian, She, Bin, Yan, Cen, Hu, Zhiheng, Dong, Xiaoyi, Chen, Lin, Pan, Zhang, Zhou, Xing, Dong, Shaoling, Tian, Yonghong, Yuan, Li

arXiv.org Artificial IntelligenceNov-28-2024

We introduce Open-Sora Plan, an open-source project that aims to contribute a large generation model for generating desired high-resolution videos with long durations based on various user inputs. Our project comprises multiple components for the entire video generation process, including a Wavelet-Flow Variational Autoencoder, a Joint Image-Video Skiparse Denoiser, and various condition controllers. Moreover, many assistant strategies for efficient training and inference are designed, and a multi-dimensional data curation pipeline is proposed for obtaining desired high-quality data. Benefiting from efficient thoughts, our Open-Sora Plan achieves impressive video generation results in both qualitative and quantitative evaluations. We hope our careful design and practical experience can inspire the video generation research community. All our codes and model weights are publicly available at \url{https://github.com/PKU-YuanGroup/Open-Sora-Plan}.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.00131

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

Li, Zongjian, Lin, Bin, Ye, Yang, Chen, Liuhan, Cheng, Xinhua, Yuan, Shenghai, Yuan, Li

arXiv.org Artificial IntelligenceNov-27-2024

Video Variational Autoencoder (VAE) encodes videos into a low-dimensional latent space, becoming a key component of most Latent Video Diffusion Models (LVDMs) to reduce model training costs. However, as the resolution and duration of generated videos increase, the encoding cost of Video VAEs becomes a limiting bottleneck in training LVDMs. Moreover, the block-wise inference method adopted by most LVDMs can lead to discontinuities of latent space when processing long-duration videos. The key to addressing the computational bottleneck lies in decomposing videos into distinct components and efficiently encoding the critical information. Wavelet transform can decompose videos into multiple frequency-domain components and improve the efficiency significantly, we thus propose Wavelet Flow VAE (WF-VAE), an autoencoder that leverages multi-level wavelet transform to facilitate low-frequency energy flow into latent representation. Furthermore, we introduce a method called Causal Cache, which maintains the integrity of latent space during block-wise inference. Compared to state-of-the-art video VAEs, WF-VAE demonstrates superior performance in both PSNR and LPIPS metrics, achieving 2x higher throughput and 4x lower memory consumption while maintaining competitive reconstruction quality. Our code and models are available at https://github.com/PKU-YuanGroup/WF-VAE.

artificial intelligence, machine learning, video, (16 more...)

arXiv.org Artificial Intelligence

2411.17459

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Build Your Own Robot Friend: An Open-Source Learning Module for Accessible and Engaging AI Education

Shi, Zhonghao, O'Connell, Allison, Li, Zongjian, Liu, Siqi, Ayissi, Jennifer, Hoffman, Guy, Soleymani, Mohammad, Matarić, Maja J.

arXiv.org Artificial IntelligenceJan-6-2024

As artificial intelligence (AI) is playing an increasingly important role in our society and global economy, AI education and literacy have become necessary components in college and K-12 education to prepare students for an AI-powered society. However, current AI curricula have not yet been made accessible and engaging enough for students and schools from all socio-economic backgrounds with different educational goals. In this work, we developed an open-source learning module for college and high school students, which allows students to build their own robot companion from the ground up. This open platform can be used to provide hands-on experience and introductory knowledge about various aspects of AI, including robotics, machine learning (ML), software engineering, and mechanical engineering. Because of the social and personal nature of a socially assistive robot companion, this module also puts a special emphasis on human-centered AI, enabling students to develop a better understanding of human-AI interaction and AI ethics through hands-on learning activities. With open-source documentation, assembling manuals and affordable materials, students from different socio-economic backgrounds can personalize their learning experience based on their individual educational goals. To evaluate the student-perceived quality of our module, we conducted a usability testing workshop with 15 college students recruited from a minority-serving institution. Our results indicate that our AI module is effective, easy-to-follow, and engaging, and it increases student interest in studying AI/ML and robotics in the future. We hope that this work will contribute toward accessible and engaging AI education in human-AI interaction for college and high school students.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2402.01647

Country:

North America > United States > California (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education > Educational Setting > K-12 Education > Secondary School (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases

Cui, Jiaxi, Li, Zongjian, Yan, Yang, Chen, Bohua, Yuan, Li

arXiv.org Artificial IntelligenceJun-28-2023

Large Language Models (LLMs) have shown the potential to revolutionize natural language processing tasks in various domains, sparking great interest in vertical-specific large models. However, unlike proprietary models such as BloombergGPT and FinGPT, which have leveraged their unique data accumulations to make strides in the finance domain, there hasn't not many similar large language models in the Chinese legal domain to facilitate its digital transformation. In this paper, we propose an open-source legal large language model named ChatLaw. Due to the importance of data quality, we carefully designed a legal domain fine-tuning dataset. Additionally, to overcome the problem of model hallucinations in legal data screening during reference data retrieval, we introduce a method that combines vector database retrieval with keyword retrieval to effectively reduce the inaccuracy of relying solely on vector database retrieval. Furthermore, we propose a self-attention method to enhance the ability of large models to overcome errors present in reference data, further optimizing the issue of model hallucinations at the model level and improving the problem-solving capabilities of large models. We also open-sourced our model and part of the data at https://github.com/PKU-YuanGroup/ChatLaw.

artificial intelligence, chatbot, natural language, (16 more...)

arXiv.org Artificial Intelligence

2306.16092

Genre: Research Report (0.50)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback