AITopics | Zhang, Shunkang

Collaborating Authors

Zhang, Shunkang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference

He, Xin, Zhang, Shunkang, Wang, Yuxin, Yin, Haiyan, Zeng, Zihao, Shi, Shaohuai, Tang, Zhenheng, Chu, Xiaowen, Tsang, Ivor, Soon, Ong Yew

arXiv.org Artificial IntelligenceOct-23-2024

Sparse Mixture of Experts (MoE) models, while outperforming dense Large Language Models (LLMs) in terms of performance, face significant deployment challenges during inference due to their high memory demands. Existing offloading techniques, which involve swapping activated and idle experts between the GPU and CPU, often suffer from rigid expert caching mechanisms. These mechanisms fail to adapt to dynamic routing, leading to inefficient cache utilization, or incur prohibitive costs for prediction training. To tackle these inference-specific challenges, we introduce ExpertFlow, a comprehensive system specifically designed to enhance inference efficiency by accommodating flexible routing and enabling efficient expert scheduling between CPU and GPU. This reduces overhead and boosts system performance. Central to our approach is a predictive routing path-based offloading mechanism that utilizes a lightweight predictor to accurately forecast routing paths before computation begins. This proactive strategy allows for real-time error correction in expert caching, significantly increasing cache hit ratios and reducing the frequency of expert transfers, thereby minimizing I/O overhead. Additionally, we implement a dynamic token scheduling strategy that optimizes MoE inference by rearranging input tokens across different batches. This method not only reduces the number of activated experts per batch but also improves computational efficiency. Our extensive experiments demonstrate that ExpertFlow achieves up to 93.72\% GPU memory savings and enhances inference speed by 2 to 10 times compared to baseline methods, highlighting its effectiveness and utility as a robust solution for resource-constrained inference scenarios.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.17954

Country: Asia > China (0.46)

Genre:

Overview (0.93)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Kimad: Adaptive Gradient Compression with Bandwidth Awareness

Xin, Jihao, Ilin, Ivan, Zhang, Shunkang, Canini, Marco, Richtárik, Peter

arXiv.org Artificial IntelligenceDec-13-2023

In distributed training, communication often emerges as a bottleneck. In response, we introduce Kimad, a solution that offers adaptive gradient compression. By consistently monitoring bandwidth, Kimad refines compression ratios to match specific neural network layer requirements. Our exhaustive tests and proofs confirm Kimad's outstanding performance, establishing it as a benchmark in adaptive compression for distributed deep learning.

artificial intelligence, kimad, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2312.08053

Genre:

Research Report (0.50)
Workflow (0.46)

Industry: Information Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Wasserstein-Wasserstein Auto-Encoders

Zhang, Shunkang, Gao, Yuan, Jiao, Yuling, Liu, Jin, Wang, Yang, Yang, Can

arXiv.org Machine LearningFeb-25-2019

To address the challenges in learning deep generative models (e.g.,the blurriness of variational auto-encoder and the instability of training generative adversarial networks, we propose a novel deep generative model, named Wasserstein-Wasserstein auto-encoders (WWAE). We formulate WWAE as minimization of the penalized optimal transport between the target distribution and the generated distribution. By noticing that both the prior $P_Z$ and the aggregated posterior $Q_Z$ of the latent code Z can be well captured by Gaussians, the proposed WWAE utilizes the closed-form of the squared Wasserstein-2 distance for two Gaussians in the optimization process. As a result, WWAE does not suffer from the sampling burden and it is computationally efficient by leveraging the reparameterization trick. Numerical results evaluated on multiple benchmark datasets including MNIST, fashion- MNIST and CelebA show that WWAE learns better latent structures than VAEs and generates samples of better visual quality and higher FID scores than VAEs and GANs.

deep learning, neural network, wwae, (19 more...)

arXiv.org Machine Learning

1902.09323

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

Deep Generative Learning via Variational Gradient Flow

Gao, Yuan, Jiao, Yuling, Wang, Yang, Wang, Yao, Yang, Can, Zhang, Shunkang

arXiv.org Machine LearningFeb-7-2019

Learning the generative model, i.e., the underlying data generating distribution, based on large amounts of data is one the fundamental task in machine learning and statistics [46].Recent advances in deep generative models have provided novel techniques for unsupervised and semi-supervised learning, with broad application varying from image synthesis [44], semantic image editing [60], image-to-image translation [61] to low-level image processing [29]. Implicit deep generative model is a powerful and flexible framework to approximate the target distribution by learning deep samplers [38] including Generative adversarialnetworks (GAN) [16] and likelihood based models, such as variational auto-encoders (VAE) [23] and flow based methods [11], as their main representatives. The above mentioned implicit deep generative models focus on learning a deterministic or stochastic nonlinear mapping that can transform low dimensional latent samples from referenced simple distribution to samples that closely match the target distribution. GANs build a minmax two player game between the generator and discriminator. During the training, the generator transforms samples from a simple reference distribution into samples that would hopefully to deceive the discriminator, while the discriminator conducts a differential two-sample test to distinguish the generated samples from the observed samples. The objective of vanilla GANs amounts to the Jensen-Shannon (JS) divergence between the learned distribution and target distributions. The vanilla GAN generates sharp image samples but suffers form the instability issues [3]. A myriad of extensions to vanilla GANs have been investigated, both theoretically or empirically, in order to achieve a stable training and high quality sample generation.

deep learning, divergence, neural network, (17 more...)

arXiv.org Machine Learning

1901.08469

Country:

Europe > United Kingdom (0.14)
Asia > China (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.65)

Add feedback