Zhang, Hongwei
CoKV: Optimizing KV Cache Allocation via Cooperative Game
Sun, Qiheng, Zhang, Hongwei, Xia, Haocheng, Zhang, Jiayao, Liu, Jinfei, Ren, Kui
Large language models (LLMs) have achieved remarkable success on various aspects of human life. However, one of the major challenges in deploying these models is the substantial memory consumption required to store key-value pairs (KV), which imposes significant resource demands. Recent research has focused on KV cache budget allocation, with several approaches proposing head-level budget distribution by evaluating the importance of individual attention heads. These methods, however, assess the importance of heads independently, overlooking their cooperative contributions within the model, which may result in a deviation from their true impact on model performance. In light of this limitation, we propose CoKV, a novel method that models the cooperation between heads in model inference as a cooperative game. By evaluating the contribution of each head within the cooperative game, CoKV can allocate the cache budget more effectively. Extensive experiments show that CoKV achieves state-of-the-art performance on the LongBench benchmark using LLama-3-8B-Instruct and Mistral-7B models.
Learning-Enhanced Safeguard Control for High-Relative-Degree Systems: Robust Optimization under Disturbances and Faults
Wang, Xinyang, Zhang, Hongwei, Wang, Shimin, Xiao, Wei, Guay, Martin
Merely pursuing performance may adversely affect the safety, while a conservative policy for safe exploration will degrade the performance. How to balance the safety and performance in learning-based control problems is an interesting yet challenging issue. This paper aims to enhance system performance with safety guarantee in solving the reinforcement learning (RL)-based optimal control problems of nonlinear systems subject to high-relative-degree state constraints and unknown time-varying disturbance/actuator faults. First, to combine control barrier functions (CBFs) with RL, a new type of CBFs, termed high-order reciprocal control barrier function (HO-RCBF) is proposed to deal with high-relative-degree constraints during the learning process. Then, the concept of gradient similarity is proposed to quantify the relationship between the gradient of safety and the gradient of performance. Finally, gradient manipulation and adaptive mechanisms are introduced in the safe RL framework to enhance the performance with a safety guarantee. Two simulation examples illustrate that the proposed safe RL framework can address high-relative-degree constraint, enhance safety robustness and improve system performance.
How Collective Intelligence Emerges in a Crowd of People Through Learned Division of Labor: A Case Study
Wang, Dekun, Zhang, Hongwei
This paper investigates the factors fostering collective intelligence (CI) through a case study of *LinYi's Experiment, where over 2000 human players collectively controll an avatar car. By conducting theoretical analysis and replicating observed behaviors through numerical simulations, we demonstrate how self-organized division of labor (DOL) among individuals fosters the emergence of CI and identify two essential conditions fostering CI by formulating this problem into a stability problem of a Markov Jump Linear System (MJLS). These conditions, independent of external stimulus, emphasize the importance of both elite and common players in fostering CI. Additionally, we propose an index for emergence of CI and a distributed method for estimating joint actions, enabling individuals to learn their optimal social roles without global action information of the whole crowd.
SNR-EQ-JSCC: Joint Source-Channel Coding with SNR-Based Embedding and Query
Zhang, Hongwei, Tao, Meixia
Coping with the impact of dynamic channels is a critical issue in joint source-channel coding (JSCC)-based semantic communication systems. In this paper, we propose a lightweight channel-adaptive semantic coding architecture called SNR-EQ-JSCC. It is built upon the generic Transformer model and achieves channel adaptation (CA) by Embedding the signal-to-noise ratio (SNR) into the attention blocks and dynamically adjusting attention scores through channel-adaptive Queries. Meanwhile, penalty terms are introduced in the loss function to stabilize the training process. Considering that instantaneous SNR feedback may be imperfect, we propose an alternative method that uses only the average SNR, which requires no retraining of SNR-EQ-JSCC. Simulation results conducted on image transmission demonstrate that the proposed SNR-EQJSCC outperforms the state-of-the-art SwinJSCC in peak signal-to-noise ratio (PSNR) and perception metrics while only requiring 0.05% of the storage overhead and 6.38% of the computational complexity for CA. Moreover, the channel-adaptive query method demonstrates significant improvements in perception metrics. When instantaneous SNR feedback is imperfect, SNR-EQ-JSCC using only the average SNR still surpasses baseline schemes.
Learning to Rank for Maps at Airbnb
Haldar, Malay, Zhang, Hongwei, Bellare, Kedar, Chen, Sherry, Banerjee, Soumyadip, Wang, Xiaotang, Abdool, Mustafa, Gao, Huiji, Tapadia, Pavan, He, Liwei, Katariya, Sanjeev
As a two-sided marketplace, Airbnb brings together hosts who own listings for rent with prospective guests from around the globe. Results from a guest's search for listings are displayed primarily through two interfaces: (1) as a list of rectangular cards that contain on them the listing image, price, rating, and other details, referred to as list-results (2) as oval pins on a map showing the listing price, called map-results. Both these interfaces, since their inception, have used the same ranking algorithm that orders listings by their booking probabilities and selects the top listings for display. But some of the basic assumptions underlying ranking, built for a world where search results are presented as lists, simply break down for maps. This paper describes how we rebuilt ranking for maps by revising the mathematical foundations of how users interact with search results. Our iterative and experiment-driven approach led us through a path full of twists and turns, ending in a unified theory for the two interfaces. Our journey shows how assumptions taken for granted when designing machine learning algorithms may not apply equally across all user interfaces, and how they can be adapted. The net impact was one of the largest improvements in user experience for Airbnb which we discuss as a series of experimental validations.
StructComp: Substituting propagation with Structural Compression in Training Graph Contrastive Learning
Zhang, Shengzhong, Yang, Wenjie, Cao, Xinyuan, Zhang, Hongwei, Huang, Zengfeng
Graph contrastive learning (GCL) has become a powerful tool for learning graph data, but its scalability remains a significant challenge. In this work, we propose a simple yet effective training framework called Structural Compression (StructComp) to address this issue. Inspired by a sparse low-rank approximation on the diffusion matrix, StructComp trains the encoder with the compressed nodes. This allows the encoder not to perform any message passing during the training stage, and significantly reduces the number of sample pairs in the contrastive loss. We theoretically prove that the original GCL loss can be approximated with the contrastive loss computed by StructComp. Moreover, StructComp can be regarded as an additional regularization term for GCL models, resulting in a more robust encoder. Empirical studies on seven benchmark datasets show that StructComp greatly reduces the time and memory consumption while improving model performance compared to the vanilla GCL models and scalable training methods.
Understanding Community Bias Amplification in Graph Representation Learning
Zhang, Shengzhong, Yang, Wenjie, Zhang, Yimin, Zhang, Hongwei, Yan, Divin, Huang, Zengfeng
In this work, we discover a phenomenon of community bias amplification in graph representation learning, which refers to the exacerbation of performance bias between different classes by graph representation learning. We conduct an in-depth theoretical study of this phenomenon from a novel spectral perspective. Our analysis suggests that structural bias between communities results in varying local convergence speeds for node embeddings. This phenomenon leads to bias amplification in the classification results of downstream tasks. Based on the theoretical insights, we propose random graph coarsening, which is proved to be effective in dealing with the above issue. Finally, we propose a novel graph contrastive learning model called Random Graph Coarsening Contrastive Learning (RGCCL), which utilizes random coarsening as data augmentation and mitigates community bias by contrasting the coarsened graph with the original graph. Extensive experiments on various datasets demonstrate the advantage of our method when dealing with community bias amplification.
Learning nonlinear dynamics in synchronization of knowledge-based leader-following networks
Wang, Shimin, Meng, Xiangyu, Zhang, Hongwei, Lewis, Frank L.
Knowledge-based leader-following synchronization problem of heterogeneous nonlinear multi-agent systems is challenging since the leader's dynamic information is unknown to all follower nodes. This paper proposes a learning-based fully distributed observer for a class of nonlinear leader systems, which can simultaneously learn the leader's dynamics and states. The class of leader dynamics considered here does not require a bounded Jacobian matrix. Based on this learning-based distributed observer, we further synthesize an adaptive distributed control law for solving the leader-following synchronization problem of multiple Euler-Lagrange systems subject to an uncertain nonlinear leader system. The results are illustrated by a simulation example.
AdaL: Adaptive Gradient Transformation Contributes to Convergences and Generalizations
Zhang, Hongwei, Zou, Weidong, Zhao, Hongbo, Ming, Qi, Yan, Tijin, Xia, Yuanqing, Cao, Weipeng
Adaptive optimization methods have been widely used in deep learning. They scale the learning rates adaptively according to the past gradient, which has been shown to be effective to accelerate the convergence. However, they suffer from poor generalization performance compared with SGD. Recent studies point that smoothing exponential gradient noise leads to generalization degeneration phenomenon. Inspired by this, we propose AdaL, with a transformation on the original gradient. AdaL accelerates the convergence by amplifying the gradient in the early stage, as well as dampens the oscillation and stabilizes the optimization by shrinking the gradient later. Such modification alleviates the smoothness of gradient noise, which produces better generalization performance. We have theoretically proved the convergence of AdaL and demonstrated its effectiveness on several benchmarks.
ScoreGrad: Multivariate Probabilistic Time Series Forecasting with Continuous Energy-based Generative Models
Yan, Tijin, Zhang, Hongwei, Zhou, Tong, Zhan, Yufeng, Xia, Yuanqing
Multivariate time series prediction has attracted a lot of attention because of its wide applications such as intelligence transportation, AIOps. Generative models have achieved impressive results in time series modeling because they can model data distribution and take noise into consideration. However, many existing works can not be widely used because of the constraints of functional form of generative models or the sensitivity to hyperparameters. In this paper, we propose ScoreGrad, a multivariate probabilistic time series forecasting framework based on continuous energy-based generative models. ScoreGrad is composed of time series feature extraction module and conditional stochastic differential equation based score matching module. The prediction can be achieved by iteratively solving reverse-time SDE. To the best of our knowledge, ScoreGrad is the first continuous energy based generative model used for time series forecasting. Furthermore, ScoreGrad achieves state-of-the-art results on six real-world datasets. The impact of hyperparameters and sampler types on the performance are also explored. Code is available at https://github.com/yantijin/ScoreGradPred.