AITopics | Tao, Ming

Collaborating Authors

Tao, Ming

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LLM Gesticulator: Leveraging Large Language Models for Scalable and Controllable Co-Speech Gesture Synthesis

Pang, Haozhou, Ding, Tianwei, He, Lanshan, Tao, Ming, Zhang, Lu, Gan, Qi

arXiv.org Artificial IntelligenceOct-22-2024

In this work, we present LLM Gesticulator, an LLM-based audio-driven co-speech gesture generation framework that synthesizes full-body animations that are rhythmically aligned with the input audio while exhibiting natural movements and editability. Compared to previous work, our model demonstrates substantial scalability. As the size of the backbone LLM model increases, our framework shows proportional improvements in evaluation metrics (a.k.a. scaling law). Our method also exhibits strong controllability where the content, style of the generated gestures can be controlled by text prompt. To the best of our knowledge, LLM gesticulator is the first work that use LLM on the co-speech generation task. Evaluation with existing objective metrics and user studies indicate that our framework outperforms prior works.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2410.10851

Country:

North America > United States (0.29)
Asia (0.28)

Genre: Overview (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Heterogeneous Parallel Non-von Neumann Architecture System for Accurate and Efficient Machine Learning Molecular Dynamics

Zhao, Zhuoying, Tan, Ziling, Mo, Pinghui, Wang, Xiaonan, Zhao, Dan, Zhang, Xin, Tao, Ming, Liu, Jie

arXiv.org Artificial IntelligenceMar-26-2023

This paper proposes a special-purpose system to achieve high-accuracy and high-efficiency machine learning (ML) molecular dynamics (MD) calculations. The system consists of field programmable gate array (FPGA) and application specific integrated circuit (ASIC) working in heterogeneous parallelization. To be specific, a multiplication-less neural network (NN) is deployed on the non-von Neumann (NvN)-based ASIC (SilTerra 180 nm process) to evaluate atomic forces, which is the most computationally expensive part of MD. All other calculations of MD are done using FPGA (Xilinx XC7Z100). It is shown that, to achieve similar-level accuracy, the proposed NvN-based system based on low-end fabrication technologies (180 nm) is 1.6x faster and 10^2-10^3x more energy efficiency than state-of-the-art vN based MLMD using graphics processing units (GPUs) based on much more advanced technologies (12 nm), indicating superiority of the proposed NvN-based heterogeneous parallel architecture.

artificial intelligence, calculation, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TCSI.2023.3255199

2303.15474

Genre: Research Report (0.64)

Industry: Semiconductors & Electronics (0.68)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

Tao, Ming, Bao, Bing-Kun, Tang, Hao, Xu, Changsheng

arXiv.org Artificial IntelligenceJan-30-2023

Synthesizing high-fidelity complex images from text is challenging. Based on large pretraining, the autoregressive and diffusion models can synthesize photo-realistic images. Although these large models have shown notable progress, there remain three flaws. 1) These models require tremendous training data and parameters to achieve good performance. 2) The multi-step generation design slows the image synthesis process heavily. 3) The synthesized visual features are difficult to control and require delicately designed prompts. To enable high-quality, efficient, fast, and controllable text-to-image synthesis, we propose Generative Adversarial CLIPs, namely GALIP. GALIP leverages the powerful pretrained CLIP model both in the discriminator and generator. Specifically, we propose a CLIP-based discriminator. The complex scene understanding ability of CLIP enables the discriminator to accurately assess the image quality. Furthermore, we propose a CLIP-empowered generator that induces the visual concepts from CLIP through bridge features and prompts. The CLIP-integrated generator and discriminator boost training efficiency, and as a result, our model only requires about 3% training data and 6% learnable parameters, achieving comparable results to large pretrained autoregressive and diffusion models. Moreover, our model achieves 120 times faster synthesis speed and inherits the smooth latent space from GAN. The extensive experimental results demonstrate the excellent performance of our GALIP. Code is available at https://github.com/tobran/GALIP.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2301.12959

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback