sgc
- North America > United States (0.14)
- Asia > China > Beijing > Beijing (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Research Report (0.46)
- Workflow (0.46)
A Training Configurations
We summarize the data statistics in our experiments in Table 1. For both fully and semi-supervised node classification tasks on the citation networks, Cora, Citeseer and Pubmed, we train our DGC following the hyper-parameters in SGC [5]. Specifically, we train DGC for 100 epochs using Adam [2] with learning rate 0.2. For weight decay, as in SGC, we tune this hyperparameter on each dataset using hyperopt [1] for 10,000 trails. For the large-scale inductive learning task on the Reddit network, we also follow the protocols of SGC [5], where we use L-BFGS [3] optimizer for 2 epochs with no weight decay.
- North America > United States (0.14)
- Asia > China > Beijing > Beijing (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Research Report (0.46)
- Workflow (0.46)
SCRN escape saddle-points and converge to local minimizers faster under Strong Growth Condition (SGC) (which
We thank all the reviewers for their valuable comments. Prior works (e.g., [VBS18]) considered only convergence to critical We provide our results in both the zeroth and higher order settings. SGC assumption for unbounded functions, which was not done before in the literature. SCRN is also significantly involved under SGC (especially in zeroth-order setup); see also Remark 6 and 7. Please see Lines 2-10 above. However, the method in [AL18] is a theoretical computer science style reduction approach.
Sparse Gradient Compression for Fine-Tuning Large Language Models
Yang, David H., Amiri, Mohammad Mohammadi, Pedapati, Tejaswini, Chaudhury, Subhajit, Chen, Pin-Yu
Fine-tuning large language models (LLMs) for downstream tasks has become increasingly crucial due to their widespread use and the growing availability of open-source models. However, the high memory costs associated with fine-tuning remain a significant challenge, especially as models increase in size. To address this, parameter efficient fine-tuning (PEFT) methods have been proposed to minimize the number of parameters required for fine-tuning LLMs. However, these approaches often tie the number of optimizer states to dimensions of model parameters, limiting flexibility and control during fine-tuning. In this paper, we propose sparse gradient compression (SGC), a training regime designed to address these limitations. Our approach leverages inherent sparsity in gradients to compress optimizer states by projecting them onto a low-dimensonal subspace, with dimensionality independent of the original model's parameters. By enabling optimizer state updates in an arbitrary low-dimensional subspace, SGC offers a flexible tradeoff between memory efficiency and performance. We demonstrate through experiments that SGC can decrease memory usage in optimizer states more effectively than existing PEFT methods. Furthermore, by fine-tuning LLMs on various downstream tasks, we show that SGC can deliver superior performance while substantially lowering optimizer state memory requirements, particularly in both data-limited and memory-limited settings.
- North America > United States (0.14)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
TSC: A Simple Two-Sided Constraint against Over-Smoothing
Peng, Furong, Liu, Kang, Lu, Xuan, Qian, Yuhua, Yan, Hongren, Ma, Chao
Graph Convolutional Neural Network (GCN), a widely adopted method for analyzing relational data, enhances node discriminability through the aggregation of neighboring information. Usually, stacking multiple layers can improve the performance of GCN by leveraging information from high-order neighbors. However, the increase of the network depth will induce the over-smoothing problem, which can be attributed to the quality and quantity of neighbors changing: (a) neighbor quality, node's neighbors become overlapping in high order, leading to aggregated information becoming indistinguishable, (b) neighbor quantity, the exponentially growing aggregated neighbors submerges the node's initial feature by recursively aggregating operations. Current solutions mainly focus on one of the above causes and seldom consider both at once. Aiming at tackling both causes of over-smoothing in one shot, we introduce a simple Two-Sided Constraint (TSC) for GCNs, comprising two straightforward yet potent techniques: random masking and contrastive constraint. The random masking acts on the representation matrix's columns to regulate the degree of information aggregation from neighbors, thus preventing the convergence of node representations. Meanwhile, the contrastive constraint, applied to the representation matrix's rows, enhances the discriminability of the nodes. Designed as a plug-in module, TSC can be easily coupled with GCN or SGC architectures. Experimental analyses on diverse real-world graph datasets verify that our approach markedly reduces the convergence of node's representation and the performance degradation in deeper GCN.
- Asia > China > Shanxi Province (0.14)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (7 more...)
Alleviating Over-Smoothing via Aggregation over Compact Manifolds
Zhou, Dongzhuoran, Yang, Hui, Xiong, Bo, Ma, Yue, Kharlamov, Evgeny
Graph neural networks (GNNs) have achieved significant success in various applications. Most GNNs learn the node features with information aggregation of its neighbors and feature transformation in each layer. However, the node features become indistinguishable after many layers, leading to performance deterioration: a significant limitation known as over-smoothing. Past work adopted various techniques for addressing this issue, such as normalization and skip-connection of layer-wise output. After the study, we found that the information aggregations in existing work are all contracted aggregations, with the intrinsic property that features will inevitably converge to the same single point after many layers. To this end, we propose the aggregation over compacted manifolds method (ACM) that replaces the existing information aggregation with aggregation over compact manifolds, a special type of manifold, which avoids contracted aggregations. In this work, we theoretically analyze contracted aggregation and its properties. We also provide an extensive empirical evaluation that shows ACM can effectively alleviate over-smoothing and outperforms the state-of-the-art.
- North America > United States > Wisconsin (0.04)
- North America > United States > Texas (0.04)
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- (2 more...)
Efficient, Direct, and Restricted Black-Box Graph Evasion Attacks to Any-Layer Graph Neural Networks via Influence Function
Wang, Binghui, Zhou, Tianxiang, Lin, Minhua, Zhou, Pan, Li, Ang, Pang, Meng, Li, Hai, Chen, Yiran
Graph neural network (GNN), the mainstream method to learn on graph data, is vulnerable to graph evasion attacks, where an attacker slightly perturbing the graph structure can fool trained GNN models. Existing work has at least one of the following drawbacks: 1) limited to directly attack two-layer GNNs; 2) inefficient; and 3) impractical, as they need to know full or part of GNN model parameters. We address the above drawbacks and propose an influence-based \emph{efficient, direct, and restricted black-box} evasion attack to \emph{any-layer} GNNs. Specifically, we first introduce two influence functions, i.e., feature-label influence and label influence, that are defined on GNNs and label propagation (LP), respectively. Then we observe that GNNs and LP are strongly connected in terms of our defined influences. Based on this, we can then reformulate the evasion attack to GNNs as calculating label influence on LP, which is \emph{inherently} applicable to any-layer GNNs, while no need to know information about the internal GNN model. Finally, we propose an efficient algorithm to calculate label influence. Experimental results on various graph datasets show that, compared to state-of-the-art white-box attacks, our attack can achieve comparable attack performance, but has a 5-50x speedup when attacking two-layer GNNs. Moreover, our attack is effective to attack multi-layer GNNs\footnote{Source code and full version is in the link: \url{https://github.com/ventr1c/InfAttack}}.
- North America > Mexico > Yucatán > Mérida (0.05)
- Asia > China > Jiangxi Province > Nanchang (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- (4 more...)
- Information Technology > Security & Privacy (0.69)
- Transportation > Air (0.62)