AITopics | Wei, Xiaobao

Collaborating Authors

Wei, Xiaobao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation

Chen, Peng, Wei, Xiaobao, Lu, Ming, Chen, Hui, Tian, Feng

arXiv.org Artificial IntelligenceMar-23-2025

Real-time speech-driven 3D facial animation has been attractive in academia and industry. Traditional methods mainly focus on learning a deterministic mapping from speech to animation. Recent approaches start to consider the nondeterministic fact of speech-driven 3D face animation and employ the diffusion model for the task. Existing diffusion-based methods can improve the diversity of facial animation. However, personalized speaking styles conveying accurate lip language is still lacking, besides, efficiency and compactness still need to be improved. In this work, we propose DiffusionTalker to address the above limitations via personalizer-guided distillation. In terms of personalization, we introduce a contrastive personalizer that learns identity and emotion embeddings to capture speaking styles from audio. We further propose a personalizer enhancer during distillation to enhance the influence of embeddings on facial animation. For efficiency, we use iterative distillation to reduce the steps required for animation generation and achieve more than 8x speedup in inference. To achieve compactness, we distill the large teacher model into a smaller student model, reducing our model's storage by 86.4\% while minimizing performance loss. After distillation, users can derive their identity and emotion embeddings from audio to quickly create personalized animations that reflect specific speaking styles. Extensive experiments are conducted to demonstrate that our method outperforms state-of-the-art methods. The code will be released at: https://github.com/ChenVoid/DiffusionTalker.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2503.18159

Country:

Asia > China (0.15)
Asia > Middle East > Israel (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Education (0.51)

Technology:

Information Technology > Graphics > Animation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.40)

Add feedback

$\textit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

Huang, Nan, Wei, Xiaobao, Zheng, Wenzhao, An, Pengju, Lu, Ming, Zhan, Wei, Tomizuka, Masayoshi, Keutzer, Kurt, Zhang, Shanghang

arXiv.org Artificial IntelligenceMay-30-2024

Photorealistic 3D reconstruction of street scenes is a critical technique for developing real-world simulators for autonomous driving. Despite the efficacy of Neural Radiance Fields (NeRF) for driving scenes, 3D Gaussian Splatting (3DGS) emerges as a promising direction due to its faster speed and more explicit representation. However, most existing street 3DGS methods require tracked 3D vehicle bounding boxes to decompose the static and dynamic elements for effective reconstruction, limiting their applications for in-the-wild scenarios. To facilitate efficient 3D scene reconstruction without costly annotations, we propose a self-supervised street Gaussian ($\textit{S}^3$Gaussian) method to decompose dynamic and static elements from 4D consistency. We represent each scene with 3D Gaussians to preserve the explicitness and further accompany them with a spatial-temporal field network to compactly model the 4D dynamics. We conduct extensive experiments on the challenging Waymo-Open dataset to evaluate the effectiveness of our method. Our $\textit{S}^3$Gaussian demonstrates the ability to decompose static and dynamic scenes and achieves the best performance without using 3D annotations. Code is available at: https://github.com/nnanhuang/S3Gaussian/.

artificial intelligence, computer vision and pattern recognition, gaussian, (13 more...)

arXiv.org Artificial Intelligence

2405.20323

Country: Asia (0.14)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (0.73)
Information Technology > Robotics & Automation (0.63)
Automobiles & Trucks (0.63)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.87)

Add feedback

MoEC: Mixture of Experts Implicit Neural Compression

Zhao, Jianchen, Tseng, Cheng-Ching, Lu, Ming, An, Ruichuan, Wei, Xiaobao, Sun, He, Zhang, Shanghang

arXiv.org Artificial IntelligenceDec-3-2023

Emerging Implicit Neural Representation (INR) is a promising data compression technique, which represents the data using the parameters of a Deep Neural Network (DNN). Existing methods manually partition a complex scene into local regions and overfit the INRs into those regions. However, manually designing the partition scheme for a complex scene is very challenging and fails to jointly learn the partition and INRs. To solve the problem, we propose MoEC, a novel implicit neural compression method based on the theory of mixture of experts. Specifically, we use a gating network to automatically assign a specific INR to a 3D point in the scene. The gating network is trained jointly with the INRs of different local regions. Compared with block-wise and tree-structured partitions, our learnable partition can adaptively find the optimal partition in an end-to-end manner. We conduct detailed experiments on massive and diverse biomedical data to demonstrate the advantages of MoEC against existing approaches. In most of experiment settings, we have achieved state-of-the-art results. Especially in cases of extreme compression ratios, such as 6000x, we are able to uphold the PSNR of 48.16.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2312.01361

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.66)

Technology:

Information Technology > Communications > Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback