AITopics | critical layer

Collaborating Authors

critical layer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training

Nepal, Aadim, Shrestha, Safal, Shrestha, Anubhav, Kim, Minwu, Naghiyev, Jalal, Shwartz-Ziv, Ravid, Ross, Keith

arXiv.org Artificial IntelligenceNov-6-2025

Large language models improve at math after instruction tuning, reinforcement learning, or knowledge distillation. We ask whether these gains come from major changes in the transformer layers or from smaller adjustments that keep the original structure. Using layer-wise ablation on base and trained variants, we find that math reasoning depends on a few critical layers, which stay important across all post-training methods. Removing these layers reduces math accuracy by as much as 80%, whereas factual recall tasks only show relatively smaller drops. This suggests that specialized layers for mathematical tasks form during pre-training and remain stable afterward. As measured by Normalized Mutual Information (NMI), we find that near these critical layers, tokens drift from their original syntactic clusters toward representations aligned with tokens less syntactically related but potentially more useful for downstream task.

critical layer, large language model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2506.22638

Country:

North America > United States (0.68)
Europe (0.68)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Spectral Insights into Data-Oblivious Critical Layers in Large Language Models

Liu, Xuyuan, Hsiung, Lei, Yang, Yaoqing, Yan, Yujun

arXiv.org Artificial IntelligenceJun-6-2025

Understanding how feature representations evolve across layers in large language models (LLMs) is key to improving their interpretability and robustness. While recent studies have identified critical layers linked to specific functions or behaviors, these efforts typically rely on data-dependent analyses of fine-tuned models, limiting their use to post-hoc settings. In contrast, we introduce a data-oblivious approach to identify intrinsic critical layers in pre-fine-tuned LLMs by analyzing representation dynamics via Centered Kernel Alignment(CKA). We show that layers with significant shifts in representation space are also those most affected during fine-tuning--a pattern that holds consistently across tasks for a given model. Our spectral analysis further reveals that these shifts are driven by changes in the top principal components, which encode semantic transitions from rationales to conclusions. We further apply these findings to two practical scenarios: efficient domain adaptation, where fine-tuning critical layers leads to greater loss reduction compared to non-critical layers; and backdoor defense, where freezing them reduces attack success rates by up to 40%.

critical layer, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2506.00382

Country:

Asia > Middle East > UAE (0.46)
North America > United States > California (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing

Li, Zhongyang, Li, Ziyue, Zhou, Tianyi

arXiv.org Artificial IntelligenceApr-11-2025

Mixture-of-Experts (MoE) Large Language Models (LLMs) suffer from severely sub-optimal expert pathways-our study reveals that naive expert selection learned from pretraining leaves a surprising 10-20% accuracy gap for improvement. Motivated by this observation, we develop a novel class of test-time optimization methods to re-weight or "re-mixing" the experts in different layers jointly for each test sample. Since the test sample's ground truth is unknown, we propose to optimize a surrogate objective defined by the sample's "successful neighbors" from a reference set of samples. We introduce three surrogates and algorithms based on mode-finding, kernel regression, and the average loss of similar reference samples/tasks. To reduce the cost of optimizing whole pathways, we apply our algorithms merely to the core experts' mixing weights in critical layers, which enjoy similar performance but save significant computation. This leads to "Critical-Layer, Core-Expert, Collaborative Pathway Optimization (C3PO)". We apply C3PO to two recent MoE LLMs and examine it on six widely-used benchmarks. It consistently improves the base model by 7-15% in accuracy and outperforms widely used test-time learning baselines, e.g., in-context learning and prompt/prefix tuning, by a large margin. Moreover, C3PO enables MoE LLMs with 1-3B active parameters to outperform LLMs of 7-9B parameters, hence improving MoE's advantages on efficiency. Our thorough ablation study further sheds novel insights on achieving test-time improvement on MoE.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2504.07964

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Enhancing Non-English Capabilities of English-Centric Large Language Models through Deep Supervision Fine-Tuning

Huo, Wenshuai, Feng, Xiaocheng, Huang, Yichong, Fu, Chengpeng, Li, Baohang, Ye, Yangfan, Zhang, Zhirui, Tu, Dandan, Tang, Duyu, Lu, Yunfei, Wang, Hui, Qin, Bing

arXiv.org Artificial IntelligenceMar-5-2025

Large language models (LLMs) have demonstrated significant progress in multilingual language understanding and generation. However, due to the imbalance in training data, their capabilities in non-English languages are limited. Recent studies revealed the English-pivot multilingual mechanism of LLMs, where LLMs implicitly convert non-English queries into English ones at the bottom layers and adopt English for thinking at the middle layers. However, due to the absence of explicit supervision for cross-lingual alignment in the intermediate layers of LLMs, the internal representations during these stages may become inaccurate. In this work, we introduce a deep supervision fine-tuning method (DFT) that incorporates additional supervision in the internal layers of the model to guide its workflow. Specifically, we introduce two training objectives on different layers of LLMs: one at the bottom layers to constrain the conversion of the target language into English, and another at the middle layers to constrain reasoning in English. To effectively achieve the guiding purpose, we designed two types of supervision signals: logits and feature, which represent a stricter constraint and a relatively more relaxed guidance. Our method guides the model to not only consider the final generated result when processing non-English inputs but also ensure the accuracy of internal representations. We conducted extensive experiments on typical English-centric large models, LLaMA-2 and Gemma-2, and the results on multiple multilingual datasets show that our method significantly outperforms traditional fine-tuning methods.

arxiv preprint arxiv, representation, supervision, (12 more...)

arXiv.org Artificial Intelligence

2503.01275

Country:

Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Rethinking the Residual Distribution of Locate-then-Editing Methods in Model Editing

Li, Xiaopeng, Wang, Shanwen, Li, Shasha, Song, Shezheng, Ji, Bin, Ma, Jun, Yu, Jie

arXiv.org Artificial IntelligenceFeb-5-2025

Model editing is a powerful technique for updating the knowledge of Large Language Models (LLMs). Locate-then-edit methods are a popular class of approaches that first identify the critical layers storing knowledge, then compute the residual of the last critical layer based on the edited knowledge, and finally perform multi-layer updates using a least-squares solution by evenly distributing the residual from the first critical layer to the last. Although these methods achieve promising results, they have been shown to degrade the original knowledge of LLMs. We argue that residual distribution leads to this issue. To explore this, we conduct a comprehensive analysis of residual distribution in locate-then-edit methods from both empirical and theoretical perspectives, revealing that residual distribution introduces editing errors, leading to inaccurate edits. To address this issue, we propose the Boundary Layer UpdatE (BLUE) strategy to enhance locate-then-edit methods. Sequential batch editing experiments on three LLMs and two datasets demonstrate that BLUE not only delivers an average performance improvement of 35.59\%, significantly advancing the state of the art in model editing, but also enhances the preservation of LLMs' general capabilities. Our code is available at https://github.com/xpq-tech/BLUE.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.03748

Country:

Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
North America > Dominican Republic (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving

Liu, Yuhan, Huang, Yuyang, Yao, Jiayi, Gu, Zhuohan, Du, Kuntai, Li, Hanchen, Cheng, Yihua, Jiang, Junchen, Lu, Shan, Musuvathi, Madan, Choukse, Esha

arXiv.org Artificial IntelligenceDec-19-2024

Large Language Models (LLMs) are increasingly employed in complex workflows, where different LLMs and fine-tuned variants collaboratively address complex tasks. However, these systems face significant inefficiencies due to redundant context processing of the shared context. We propose DroidSpeak, a framework that optimizes context sharing between fine-tuned LLMs derived from the same foundational model. DroidSpeak identifies critical layers in the KV cache and selectively recomputes them, enabling effective reuse of intermediate data while maintaining high accuracy. Our approach balances computational efficiency and task fidelity, significantly reducing inference latency and throughput bottlenecks. Experiments on diverse datasets and model pairs demonstrate that DroidSpeak achieves up to 3x higher throughputs and 2.6x faster prefill times with negligible accuracy loss compared to full recomputation.

cache, droidspeak, kv cache, (17 more...)

arXiv.org Artificial Intelligence

2411.0282

Country:

North America > United States > California > Santa Clara County > Santa Clara (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Criticality Leveraged Adversarial Training (CLAT) for Boosted Performance via Parameter Efficiency

Gopal, Bhavna, Yang, Huanrui, Zhang, Jingyang, Horton, Mark, Chen, Yiran

arXiv.org Artificial IntelligenceAug-30-2024

Adversarial training enhances neural network robustness but suffers from a tendency to overfit and increased generalization errors on clean data. This work introduces CLAT, an innovative approach that mitigates adversarial overfitting by introducing parameter efficiency into the adversarial training process, improving both clean accuracy and adversarial robustness. Instead of tuning the entire model, CLAT identifies and fine-tunes robustness-critical layers - those predominantly learning non-robust features - while freezing the remaining model to enhance robustness. It employs dynamic critical layer selection to adapt to changes in layer criticality throughout the fine-tuning process. Empirically, CLAT can be applied on top of existing adversarial training methods, significantly reduces the number of trainable parameters by approximately 95%, and achieves more than a 2% improvement in adversarial robustness compared to baseline methods.

clat, critical layer, robustness, (17 more...)

arXiv.org Artificial Intelligence

2408.10204

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Backdoor Defense via Suppressing Model Shortcuts

Yang, Sheng, Li, Yiming, Jiang, Yong, Xia, Shu-Tao

arXiv.org Artificial IntelligenceMar-5-2023

Recent studies have demonstrated that deep neural networks (DNNs) are vulnerable to backdoor attacks during the training process. Specifically, the adversaries intend to embed hidden backdoors in DNNs so that malicious model predictions can be activated through pre-defined trigger patterns. In this paper, we explore the backdoor mechanism from the angle of the model structure. We select the skip connection for discussions, inspired by the understanding that it helps the learning of model `shortcuts' where backdoor triggers are usually easier to be learned. Specifically, we demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections. Based on this observation, we design a simple yet effective backdoor removal method by suppressing the skip connections in critical layers selected by our method. We also implement fine-tuning on these layers to recover high benign accuracy and to further reduce ASR. Extensive experiments on benchmark datasets verify the effectiveness of our method.

artificial intelligence, machine learning, skip connection, (16 more...)

arXiv.org Artificial Intelligence

2211.05631

Country: Asia > China > Guangdong Province > Shenzhen (0.05)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Defending Against Backdoor Attacks by Layer-wise Feature Analysis

Jebreel, Najeeb Moharram, Domingo-Ferrer, Josep, Li, Yiming

arXiv.org Artificial IntelligenceFeb-24-2023

Training deep neural networks (DNNs) usually requires massive training data and computational resources. Users who cannot afford this may prefer to outsource training to a third party or resort to publicly available pre-trained models. Unfortunately, doing so facilitates a new training-time attack (i.e., backdoor attack) against DNNs. This attack aims to induce misclassification of input samples containing adversary-specified trigger patterns. In this paper, we first conduct a layer-wise feature analysis of poisoned and benign samples from the target class. We find out that the feature difference between benign and poisoned samples tends to be maximum at a critical layer, which is not always the one typically used in existing defenses, namely the layer before fully-connected layers. We also demonstrate how to locate this critical layer based on the behaviors of benign samples. We then propose a simple yet effective method to filter poisoned samples by analyzing the feature differences between suspicious and benign samples at the critical layer. We conduct extensive experiments on two benchmark datasets, which confirm the effectiveness of our defense.

artificial intelligence, benign sample, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2302.12758

Country:

Europe > Spain > Catalonia > Tarragona Province > Tarragona (0.04)
Asia > Nepal (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback