AITopics | Zheng, Jiachen

Collaborating Authors

Zheng, Jiachen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Metis: A Foundation Speech Generation Model with Masked Generative Pre-training

Wang, Yuancheng, Zheng, Jiachen, Zhang, Junan, Zhang, Xueyao, Liao, Huan, Wu, Zhizheng

arXiv.org Artificial IntelligenceFeb-5-2025

We introduce Metis, a foundation model for unified speech generation. Unlike previous task-specific or multi-task models, Metis follows a pre-training and fine-tuning paradigm. It is pre-trained on large-scale unlabeled speech data using masked generative modeling and then fine-tuned to adapt to diverse speech generation tasks. Specifically, 1) Metis utilizes two discrete speech representations: SSL tokens derived from speech self-supervised learning (SSL) features, and acoustic tokens directly quantized from waveforms. 2) Metis performs masked generative pre-training on SSL tokens, utilizing 300K hours of diverse speech data, without any additional condition. 3) Through fine-tuning with task-specific conditions, Metis achieves efficient adaptation to various speech generation tasks while supporting multimodal input, even when using limited data and trainable parameters. Experiments demonstrate that Metis can serve as a foundation model for unified speech generation: Metis outperforms state-of-the-art task-specific or multi-task systems across five speech generation tasks, including zero-shot text-to-speech, voice conversion, target speaker extraction, speech enhancement, and lip-to-speech, even with fewer than 20M trainable parameters or 300 times less training data. Audio samples are are available at https://metis-demo.github.io/.

arxiv preprint arxiv, large language model, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2502.03128

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

On Diffusion Process in SE(3)-invariant Space

Zhou, Zihan, Liu, Ruiying, Zheng, Jiachen, Wang, Xiaoxue, Yu, Tianshu

arXiv.org Artificial IntelligenceMar-3-2024

Sampling viable 3D structures (e.g., molecules and point clouds) with SE(3)-invariance using diffusion-based models proved promising in a variety of real-world applications, wherein SE(3)-invariant properties can be naturally characterized by the inter-point distance manifold. However, due to the non-trivial geometry, we still lack a comprehensive understanding of the diffusion mechanism within such SE(3)-invariant space. This study addresses this gap by mathematically delineating the diffusion mechanism under SE(3)-invariance, via zooming into the interaction behavior between coordinates and the inter-point distance manifold through the lens of differential geometry. Upon this analysis, we propose accurate and projection-free diffusion SDE and ODE accordingly. Such formulations enable enhancing the performance and the speed of generation pathways; meanwhile offering valuable insights into other systems incorporating SE(3)-invariance.

artificial intelligence, machine learning, optimization problem, (14 more...)

arXiv.org Artificial Intelligence

2403.0143

Country:

Asia > China (0.14)
Europe > Denmark (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback