Yang, Jing
A Codesign of Scheduling and Parallelization for Large Model Training in Heterogeneous Clusters
Xue, Chunyu, Cui, Weihao, Zhao, Han, Chen, Quan, Zhang, Shulai, Yang, Pengyu, Yang, Jing, Li, Shaobo, Guo, Minyi
Joint consideration of scheduling and adaptive parallelism offers great opportunities for improving the training efficiency of large models on heterogeneous GPU clusters. However, integrating adaptive parallelism into a cluster scheduler expands the cluster scheduling space. The new space is the product of the original scheduling space and the parallelism exploration space of adaptive parallelism (also a product of pipeline, data, and tensor parallelism). The exponentially enlarged scheduling space and ever-changing optimal parallelism plan from adaptive parallelism together result in the contradiction between low-overhead and accurate performance data acquisition for efficient cluster scheduling. This paper presents Crius, a training system for efficiently scheduling multiple large models with adaptive parallelism in a heterogeneous cluster. Crius proposes a novel scheduling granularity called Cell. It represents a job with deterministic resources and pipeline stages. The exploration space of Cell is shrunk to the product of only data and tensor parallelism, thus exposing the potential for accurate and low-overhead performance estimation. Crius then accurately estimates Cells and efficiently schedules training jobs. When a Cell is selected as a scheduling choice, its represented job runs with the optimal parallelism plan explored. Experimental results show that Crius reduces job completion time by up to 48.9% and schedules large models with up to 1.49x cluster throughput improvement.
Best Arm Identification for Prompt Learning under a Limited Budget
Shi, Chengshuai, Yang, Kun, Yang, Jing, Shen, Cong
The remarkable instruction-following capability of large language models (LLMs) has sparked a growing interest in automatically learning suitable prompts. However, while many effective methods have been proposed, the cost incurred during the learning process (e.g., accessing LLM and evaluating the responses) has not been considered. To overcome this limitation, this work explicitly incorporates a finite budget constraint into prompt learning. Towards developing principled solutions, a novel connection is established between prompt learning and fixed-budget best arm identification (BAI-FB) in multi-armed bandits (MAB). Based on this connection, a general framework TRIPLE (besT aRm Identification for Prompt LEarning) is proposed to harness the power of BAI-FB in prompt learning systematically. Unique characteristics of prompt learning further lead to two embedding-based enhancements of TRIPLE by exploiting the ideas of clustering and function approximation. Extensive experiments on multiple well-adopted tasks using both GPT 3.5 and Llama2 demonstrate the significant performance improvement of TRIPLE over the previous baselines while satisfying the limited budget constraints.
Two Heads Are Better Than One: Integrating Knowledge from Knowledge Graphs and Large Language Models for Entity Alignment
Yang, Linyao, Chen, Hongyang, Wang, Xiao, Yang, Jing, Wang, Fei-Yue, Liu, Han
Entity alignment, which is a prerequisite for creating a more comprehensive Knowledge Graph (KG), involves pinpointing equivalent entities across disparate KGs. Contemporary methods for entity alignment have predominantly utilized knowledge embedding models to procure entity embeddings that encapsulate various similarities-structural, relational, and attributive. These embeddings are then integrated through attention-based information fusion mechanisms. Despite this progress, effectively harnessing multifaceted information remains challenging due to inherent heterogeneity. Moreover, while Large Language Models (LLMs) have exhibited exceptional performance across diverse downstream tasks by implicitly capturing entity semantics, this implicit knowledge has yet to be exploited for entity alignment. In this study, we propose a Large Language Model-enhanced Entity Alignment framework (LLMEA), integrating structural knowledge from KGs with semantic knowledge from LLMs to enhance entity alignment. Specifically, LLMEA identifies candidate alignments for a given entity by considering both embedding similarities between entities across KGs and edit distances to a virtual equivalent entity. It then engages an LLM iteratively, posing multiple multi-choice questions to draw upon the LLM's inference capability. The final prediction of the equivalent entity is derived from the LLM's output. Experiments conducted on three public datasets reveal that LLMEA surpasses leading baseline models. Additional ablation studies underscore the efficacy of our proposed framework.
Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long Multi-track Symbolic Music Generation
Lin, Zhiwei, Chen, Jun, Tang, Boshi, Sha, Binzhu, Yang, Jing, Ju, Yaolong, Fan, Fan, Kang, Shiyin, Wu, Zhiyong, Meng, Helen
Variational Autoencoders (VAEs) constitute a crucial component of neural symbolic music generation, among which some works have yielded outstanding results and attracted considerable attention. Nevertheless, previous VAEs still encounter issues with overly long feature sequences and generated results lack contextual coherence, thus the challenge of modeling long multi-track symbolic music still remains unaddressed. To this end, we propose Multi-view MidiVAE, as one of the pioneers in VAE methods that effectively model and generate long multi-track symbolic music. The Multi-view MidiVAE utilizes the two-dimensional (2-D) representation, OctupleMIDI, to capture relationships among notes while reducing the feature sequences length. Moreover, we focus on instrumental characteristics and harmony as well as global and local information about the musical composition by employing a hybrid variational encoding-decoding strategy to integrate both Track- and Bar-view MidiVAE features. Objective and subjective experimental results on the CocoChorales dataset demonstrate that, compared to the baseline, Multi-view MidiVAE exhibits significant improvements in terms of modeling long multi-track symbolic music.
Federated Q-Learning: Linear Regret Speedup with Low Communication Cost
Zheng, Zhong, Gao, Fengyu, Xue, Lingzhou, Yang, Jing
In this paper, we consider federated reinforcement learning for tabular episodic Markov Decision Processes (MDP) where, under the coordination of a central server, multiple agents collaboratively explore the environment and learn an optimal policy without sharing their raw data. While linear speedup in the number of agents has been achieved for some metrics, such as convergence rate and sample complexity, in similar settings, it is unclear whether it is possible to design a model-free algorithm to achieve linear regret speedup with low communication cost. We propose two federated Q-Learning algorithms termed as FedQ-Hoeffding and FedQ-Bernstein, respectively, and show that the corresponding total regrets achieve a linear speedup compared with their single-agent counterparts when the time horizon is sufficiently large, while the communication cost scales logarithmically in the total number of time steps $T$. Those results rely on an event-triggered synchronization mechanism between the agents and the server, a novel step size selection when the server aggregates the local estimates of the state-action values to form the global estimates, and a set of new concentration inequalities to bound the sum of non-martingale differences. This is the first work showing that linear regret speedup and logarithmic communication cost can be achieved by model-free algorithms in federated reinforcement learning.
PMET: Precise Model Editing in a Transformer
Li, Xiaopeng, Li, Shasha, Song, Shezheng, Yang, Jing, Ma, Jun, Yu, Jie
Model editing techniques modify a minor proportion of knowledge in Large Language Models (LLMs) at a relatively low cost, which have demonstrated notable success. Existing methods assume Transformer Layer (TL) hidden states are values of key-value memories of the Feed-Forward Network (FFN). They usually optimize the TL hidden states to memorize target knowledge and use it to update the weights of the FFN in LLMs. However, the information flow of TL hidden states comes from three parts: Multi-Head Self-Attention (MHSA), FFN, and residual connections. Existing methods neglect the fact that the TL hidden states contains information not specifically required for FFN. Consequently, the performance of model editing decreases. To achieve more precise model editing, we analyze hidden states of MHSA and FFN, finding that MHSA encodes certain general knowledge extraction patterns. This implies that MHSA weights do not require updating when new knowledge is introduced. Based on above findings, we introduce PMET, which simultaneously optimizes Transformer Component (TC, namely MHSA and FFN) hidden states, while only using the optimized TC hidden states of FFN to precisely update FFN weights. Our experiments demonstrate that PMET exhibits state-of-the-art performance on both the COUNTERFACT and zsRE datasets. Our ablation experiments substantiate the effectiveness of our enhancements, further reinforcing the finding that the MHSA encodes certain general knowledge extraction patterns and indicating its storage of a small amount of factual knowledge. Our code is available at https://github.com/xpq-tech/PMET.
Advancing RAN Slicing with Offline Reinforcement Learning
Yang, Kun, Yeh, Shu-ping, Zhang, Menglei, Sydir, Jerry, Yang, Jing, Shen, Cong
Dynamic radio resource management (RRM) in wireless networks presents significant challenges, particularly in the context of Radio Access Network (RAN) slicing. This technology, crucial for catering to varying user requirements, often grapples with complex optimization scenarios. Existing Reinforcement Learning (RL) approaches, while achieving good performance in RAN slicing, typically rely on online algorithms or behavior cloning. These methods necessitate either continuous environmental interactions or access to high-quality datasets, hindering their practical deployment. Towards addressing these limitations, this paper introduces offline RL to solving the RAN slicing problem, marking a significant shift towards more feasible and adaptive RRM methods. We demonstrate how offline RL can effectively learn near-optimal policies from sub-optimal datasets, a notable advancement over existing practices. Our research highlights the inherent flexibility of offline RL, showcasing its ability to adjust policy criteria without the need for additional environmental interactions. Furthermore, we present empirical evidence of the efficacy of offline RL in adapting to various service-level requirements, illustrating its potential in diverse RAN slicing scenarios.
Differentially Private Wireless Federated Learning Using Orthogonal Sequences
Wei, Xizixiang, Wang, Tianhao, Huang, Ruiquan, Shen, Cong, Yang, Jing, Poor, H. Vincent
We propose a privacy-preserving uplink over-the-air computation (AirComp) method, termed FLORAS, for single-input single-output (SISO) wireless federated learning (FL) systems. From the perspective of communication designs, FLORAS eliminates the requirement of channel state information at the transmitters (CSIT) by leveraging the properties of orthogonal sequences. From the privacy perspective, we prove that FLORAS offers both item-level and client-level differential privacy (DP) guarantees. Moreover, by properly adjusting the system parameters, FLORAS can flexibly achieve different DP levels at no additional cost. A new FL convergence bound is derived which, combined with the privacy guarantees, allows for a smooth tradeoff between the achieved convergence rate and differential privacy levels. Experimental results demonstrate the advantages of FLORAS compared with the baseline AirComp method, and validate that the analytical results can guide the design of privacy-preserving FL with different tradeoff requirements on the model convergence and privacy levels. A preliminary version of this work was presented at the 2023 IEEE International Conference on Communications [1]. Xizixiang Wei and Cong Shen are with the Charles L. Brown Department of Electrical and Computer Engineering, University of Virginia, USA. Tianhao Wang is with the Department of Computer Science, University of Virginia, USA. Ruiquan Huang and Jing Yang are with The Department of Electrical Engineering, The Pennsylvania State University, USA.
Reward Teaching for Federated Multi-armed Bandits
Shi, Chengshuai, Xiong, Wei, Shen, Cong, Yang, Jing
Most of the existing federated multi-armed bandits (FMAB) designs are based on the presumption that clients will implement the specified design to collaborate with the server. In reality, however, it may not be possible to modify the clients' existing protocols. To address this challenge, this work focuses on clients who always maximize their individual cumulative rewards, and introduces a novel idea of ``reward teaching'', where the server guides the clients towards global optimality through implicit local reward adjustments. Under this framework, the server faces two tightly coupled tasks of bandit learning and target teaching, whose combination is non-trivial and challenging. A phased approach, called Teaching-After-Learning (TAL), is first designed to encourage and discourage clients' explorations separately. General performance analyses of TAL are established when the clients' strategies satisfy certain mild requirements. With novel technical approaches developed to analyze the warm-start behaviors of bandit algorithms, particularized guarantees of TAL with clients running UCB or epsilon-greedy strategies are then obtained. These results demonstrate that TAL achieves logarithmic regrets while only incurring logarithmic adjustment costs, which is order-optimal w.r.t. a natural lower bound. As a further extension, the Teaching-While-Learning (TWL) algorithm is developed with the idea of successive arm elimination to break the non-adaptive phase separation in TAL. Rigorous analyses demonstrate that when facing clients with UCB1, TWL outperforms TAL in terms of the dependencies on sub-optimality gaps thanks to its adaptive design. Experimental results demonstrate the effectiveness and generality of the proposed algorithms.
A New Approach to Intuitionistic Fuzzy Decision Making Based on Projection Technology and Cosine Similarity Measure
Yang, Jing, Su, Wei
For a multi-attribute decision making (MADM) problem, the information of alternatives under different attributes is given in the form of intuitionistic fuzzy number(IFN). Intuitionistic fuzzy set (IFS) plays an important role in dealing with un-certain and incomplete information. The similarity measure of intuitionistic fuzzy sets (IFSs) has always been a research hotspot. A new similarity measure of IFSs based on the projection technology and cosine similarity measure, which con-siders the direction and length of IFSs at the same time, is first proposed in this paper. The objective of the presented pa-per is to develop a MADM method and medical diagnosis method under IFS using the projection technology and cosine similarity measure. Some examples are used to illustrate the comparison results of the proposed algorithm and some exist-ing methods. The comparison result shows that the proposed algorithm is effective and can identify the optimal scheme accurately. In medical diagnosis area, it can be used to quickly diagnose disease. The proposed method enriches the exist-ing similarity measure methods and it can be applied to not only IFSs, but also other interval-valued intuitionistic fuzzy sets(IVIFSs) as well.