AITopics | sgc

Collaborating Authors

sgc

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

2d95666e2649fcfc6e3af75e09f5adb9-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 07:26:08 GMT

artificial intelligence, dgc, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

2d95666e2649fcfc6e3af75e09f5adb9-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 07:26:04 GMT

artificial intelligence, gcn, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia > China (0.29)

Genre:

Research Report (0.46)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Country:

North America > United States (0.14)
Asia > China > Beijing > Beijing (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre:

Research Report (0.46)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

SCRN escape saddle-points and converge to local minimizers faster under Strong Growth Condition (SGC) (which

Neural Information Processing SystemsAug-15-2025, 03:30:00 GMT

We thank all the reviewers for their valuable comments. Prior works (e.g., [VBS18]) considered only convergence to critical We provide our results in both the zeroth and higher order settings. SGC assumption for unbounded functions, which was not done before in the literature. SCRN is also significantly involved under SGC (especially in zeroth-order setup); see also Remark 6 and 7. Please see Lines 2-10 above. However, the method in [AL18] is a theoretical computer science style reduction approach.

assumption, scrn escape saddle-point and converge, sgc, (10 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)

Add feedback

Sparse Gradient Compression for Fine-Tuning Large Language Models

Yang, David H., Amiri, Mohammad Mohammadi, Pedapati, Tejaswini, Chaudhury, Subhajit, Chen, Pin-Yu

arXiv.org Artificial IntelligenceJan-31-2025

Fine-tuning large language models (LLMs) for downstream tasks has become increasingly crucial due to their widespread use and the growing availability of open-source models. However, the high memory costs associated with fine-tuning remain a significant challenge, especially as models increase in size. To address this, parameter efficient fine-tuning (PEFT) methods have been proposed to minimize the number of parameters required for fine-tuning LLMs. However, these approaches often tie the number of optimizer states to dimensions of model parameters, limiting flexibility and control during fine-tuning. In this paper, we propose sparse gradient compression (SGC), a training regime designed to address these limitations. Our approach leverages inherent sparsity in gradients to compress optimizer states by projecting them onto a low-dimensonal subspace, with dimensionality independent of the original model's parameters. By enabling optimizer state updates in an arbitrary low-dimensional subspace, SGC offers a flexible tradeoff between memory efficiency and performance. We demonstrate through experiments that SGC can decrease memory usage in optimizer states more effectively than existing PEFT methods. Furthermore, by fine-tuning LLMs on various downstream tasks, we show that SGC can deliver superior performance while substantially lowering optimizer state memory requirements, particularly in both data-limited and memory-limited settings.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.00311

Country:

North America > United States (0.14)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)

Genre: Research Report (0.87)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

TSC: A Simple Two-Sided Constraint against Over-Smoothing

Peng, Furong, Liu, Kang, Lu, Xuan, Qian, Yuhua, Yan, Hongren, Ma, Chao

arXiv.org Artificial IntelligenceAug-6-2024

Graph Convolutional Neural Network (GCN), a widely adopted method for analyzing relational data, enhances node discriminability through the aggregation of neighboring information. Usually, stacking multiple layers can improve the performance of GCN by leveraging information from high-order neighbors. However, the increase of the network depth will induce the over-smoothing problem, which can be attributed to the quality and quantity of neighbors changing: (a) neighbor quality, node's neighbors become overlapping in high order, leading to aggregated information becoming indistinguishable, (b) neighbor quantity, the exponentially growing aggregated neighbors submerges the node's initial feature by recursively aggregating operations. Current solutions mainly focus on one of the above causes and seldom consider both at once. Aiming at tackling both causes of over-smoothing in one shot, we introduce a simple Two-Sided Constraint (TSC) for GCNs, comprising two straightforward yet potent techniques: random masking and contrastive constraint. The random masking acts on the representation matrix's columns to regulate the degree of information aggregation from neighbors, thus preventing the convergence of node representations. Meanwhile, the contrastive constraint, applied to the representation matrix's rows, enhances the discriminability of the nodes. Designed as a plug-in module, TSC can be easily coupled with GCN or SGC architectures. Experimental analyses on diverse real-world graph datasets verify that our approach markedly reduces the convergence of node's representation and the performance degradation in deeper GCN.

information, node, representation, (14 more...)

arXiv.org Artificial Intelligence

2408.03152

Country:

Asia > China > Shanxi Province (0.14)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
North America > United States > New York > New York County > New York City (0.04)
(7 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Alleviating Over-Smoothing via Aggregation over Compact Manifolds

Zhou, Dongzhuoran, Yang, Hui, Xiong, Bo, Ma, Yue, Kharlamov, Evgeny

arXiv.org Artificial IntelligenceJul-27-2024

Graph neural networks (GNNs) have achieved significant success in various applications. Most GNNs learn the node features with information aggregation of its neighbors and feature transformation in each layer. However, the node features become indistinguishable after many layers, leading to performance deterioration: a significant limitation known as over-smoothing. Past work adopted various techniques for addressing this issue, such as normalization and skip-connection of layer-wise output. After the study, we found that the information aggregations in existing work are all contracted aggregations, with the intrinsic property that features will inevitably converge to the same single point after many layers. To this end, we propose the aggregation over compacted manifolds method (ACM) that replaces the existing information aggregation with aggregation over compact manifolds, a special type of manifold, which avoids contracted aggregations. In this work, we theoretically analyze contracted aggregation and its properties. We also provide an extensive empirical evaluation that shows ACM can effectively alleviate over-smoothing and outperforms the state-of-the-art.

aggregation, alleviating over-smoothing, manifold, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-981-97-2253-2_31

2407.19231

Country:

North America > United States > Wisconsin (0.04)
North America > United States > Texas (0.04)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Efficient, Direct, and Restricted Black-Box Graph Evasion Attacks to Any-Layer Graph Neural Networks via Influence Function

Wang, Binghui, Zhou, Tianxiang, Lin, Minhua, Zhou, Pan, Li, Ang, Pang, Meng, Li, Hai, Chen, Yiran

arXiv.org Artificial IntelligenceDec-15-2023

Graph neural network (GNN), the mainstream method to learn on graph data, is vulnerable to graph evasion attacks, where an attacker slightly perturbing the graph structure can fool trained GNN models. Existing work has at least one of the following drawbacks: 1) limited to directly attack two-layer GNNs; 2) inefficient; and 3) impractical, as they need to know full or part of GNN model parameters. We address the above drawbacks and propose an influence-based \emph{efficient, direct, and restricted black-box} evasion attack to \emph{any-layer} GNNs. Specifically, we first introduce two influence functions, i.e., feature-label influence and label influence, that are defined on GNNs and label propagation (LP), respectively. Then we observe that GNNs and LP are strongly connected in terms of our defined influences. Based on this, we can then reformulate the evasion attack to GNNs as calculating label influence on LP, which is \emph{inherently} applicable to any-layer GNNs, while no need to know information about the internal GNN model. Finally, we propose an efficient algorithm to calculate label influence. Experimental results on various graph datasets show that, compared to state-of-the-art white-box attacks, our attack can achieve comparable attack performance, but has a 5-50x speedup when attacking two-layer GNNs. Moreover, our attack is effective to attack multi-layer GNNs\footnote{Source code and full version is in the link: \url{https://github.com/ventr1c/InfAttack}}.

graph, label influence, target node, (11 more...)

arXiv.org Artificial Intelligence

2009.00203

Country: