AITopics | Wei, Zhen

Collaborating Authors

Wei, Zhen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation

Jia, Dongya, Chen, Zhuo, Chen, Jiawei, Du, Chenpeng, Wu, Jian, Cong, Jian, Zhuang, Xiaobin, Li, Chumin, Wei, Zhen, Wang, Yuping, Wang, Yuxuan

arXiv.org Artificial IntelligenceFeb-14-2025

Several recent studies have attempted to autoregressively generate continuous speech representations without discrete speech tokens by combining diffusion and autoregressive models, yet they often face challenges with excessive computational loads or suboptimal outcomes. In this work, we propose Diffusion Transformer Autoregressive Modeling (DiTAR), a patch-based autoregressive framework combining a language model with a diffusion transformer. This approach significantly enhances the efficacy of autoregressive models for continuous tokens and reduces computational demands. DiTAR utilizes a divide-and-conquer strategy for patch generation, where the language model processes aggregated patch embeddings and the diffusion transformer subsequently generates the next patch based on the output of the language model. For inference, we propose defining temperature as the time point of introducing noise during the reverse diffusion ODE to balance diversity and determinism. We also show in the extensive scaling analysis that DiTAR has superb scalability. In zero-shot speech generation, DiTAR achieves state-of-the-art performance in robustness, speaker similarity, and naturalness.

large language model, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2502.0393

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

UCP: Uniform Channel Pruning for Deep Convolutional Neural Networks Compression and Acceleration

Chang, Jingfei, Lu, Yang, Xue, Ping, Wei, Xing, Wei, Zhen

arXiv.org Artificial IntelligenceOct-2-2020

To apply deep CNNs to mobile terminals and portable devices, many scholars have recently worked on the compressing and accelerating deep convolutional neural networks. Based on this, we propose a novel uniform channel pruning (UCP) method to prune deep CNN, and the modified squeeze-and-excitation blocks (MSEB) is used to measure the importance of the channels in the convolutional layers. The unimportant channels, including convolutional kernels related to them, are pruned directly, which greatly reduces the storage cost and the number of calculations. There are two types of residual blocks in ResNet. For ResNet with bottlenecks, we use the pruning method with traditional CNN to trim the 3x3 convolutional layer in the middle of the blocks. For ResNet with basic residual blocks, we propose an approach to consistently prune all residual blocks in the same stage to ensure that the compact network structure is dimensionally correct. Considering that the network loses considerable information after pruning and that the larger the pruning amplitude is, the more information that will be lost, we do not choose fine-tuning but retrain from scratch to restore the accuracy of the network after pruning. Finally, we verified our method on CIFAR-10, CIFAR-100 and ILSVRC-2012 for image classification. The results indicate that the performance of the compact network after retraining from scratch, when the pruning rate is small, is better than the original network. Even when the pruning amplitude is large, the accuracy can be maintained or decreased slightly. On the CIFAR-100, when reducing the parameters and FLOPs up to 82% and 62% respectively, the accuracy of VGG-19 even improved by 0.54% after retraining.

deep learning, neural network, residual block, (17 more...)

arXiv.org Artificial Intelligence

2010.01251

Country: Asia > China (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback