staleness
SupplementaryMaterial
R(h). (23) Here for simplicity, we abused the symbolD in(22)by maximizing outh0 in the originalD. In the top-left areaP,suppose only oneexample (markedbyxwith vertical coordinate1)isconfidently labeled as positive, and the rest examples are highly inconfidently labeled, hence not to contribute to the riskR. Similarly,there isonly one confidently labeled example ()inthe bottom-right area ofP, and it is negative with vertical coordinate 1. Wheneverλ > 2, the optimalhλ is in(0,1)and can be solved by a quadratic equation. In contrast,di-MDD is immune to this problem becauseRis used only to determineh, while the di-MDD value itself is solely contributed byD. Same as the scenario of largeλ, we do not change the feature distribution of source and target domains, hence keepingD(h) = 1 |h|.
ParaBlock: Communication-Computation Parallel Block Coordinate Federated Learning for Large Language Models
Wang, Yujia, Cao, Yuanpu, Chen, Jinghui
Federated learning (FL) has been extensively studied as a privacy-preserving training paradigm. Recently, federated block coordinate descent scheme has become a popular option in training large-scale models, as it allows clients to train only a subset of the model locally instead of the entire model. However, in the era of large language models (LLMs), even a single block can contain a significant number of parameters, posing substantial communication latency, particularly for resource-constrained clients. To address this challenge in federated training/fine-tuning LLMs, we propose ParaBlock, a novel approach that establishes two parallel threads for communication and computation to enhance communication efficiency. We theoretically prove that the proposed ParaBlock achieves the same convergence rate as the standard federated block coordinate descent methods. Empirical evaluations on fine-tuning LLMs on general instruction following and mathematical reasoning confirm that ParaBlock not only maintains strong performance but also significantly improves communication efficiency.
- North America > United States > Pennsylvania (0.04)
- North America > United States > Virginia (0.04)
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.34)
- North America > United States > California (0.14)
- North America > United States > Illinois (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology (0.47)
- Government > Regional Government (0.46)
VISAGNN: Versatile Staleness-Aware Efficient Training on Large-Scale Graphs
Graph Neural Networks (GNNs) have shown exceptional success in graph representation learning and a wide range of real-world applications. However, scaling deeper GNNs poses challenges due to the neighbor explosion problem when training on large-scale graphs. To mitigate this, a promising class of GNN training algorithms utilizes historical embeddings to reduce computation and memory costs while preserving the expressiveness of the model. These methods leverage historical embeddings for out-of-batch nodes, effectively approximating full-batch training without losing any neighbor information-a limitation found in traditional sampling methods. However, the staleness of these historical embeddings often introduces significant bias, acting as a bottleneck that can adversely affect model performance. In this paper, we propose a novel VersatIle Staleness-Aware GNN, named VISAGNN, which dynamically and adaptively incorporates staleness criteria into the large-scale GNN training process. By embedding staleness into the message passing mechanism, loss function, and historical embeddings during training, our approach enables the model to adaptively mitigate the negative effects of stale embeddings, thereby reducing estimation errors and enhancing downstream accuracy. Comprehensive experiments demonstrate the effectiveness of our method in overcoming the staleness issue of existing historical embedding techniques, showcasing its superior performance and efficiency on large-scale benchmarks, along with significantly faster convergence.
- North America > United States > North Carolina (0.40)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > New Finding (0.67)
- Research Report > Promising Solution (0.46)
- North America > United States > California (0.14)
- North America > United States > Illinois (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology (0.47)
- Government > Regional Government (0.46)