AITopics

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsDec-27-2025, 14:24:15 GMT

Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective

The two-stage fine-tuning (FT) method, linear probing (LP) then fine-tuning (LP-FT), outperforms linear probing and FT alone. This holds true for both in-distribution (ID) and out-of-distribution (OOD) data. One key reason for its success is the preservation of pre-trained features, achieved by obtaining a near-optimal linear head during LP. However, despite the widespread use of large language models, there has been limited exploration of more complex architectures such as Transformers. In this paper, we analyze the training dynamics of LP-FT for classification tasks on the basis of the neural tangent kernel (NTK) theory.

artificial intelligence, large language model, natural language, (7 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.39)

arXiv.org Machine LearningNov-18-2025

A Closer Look at Personalized Fine-Tuning in Heterogeneous Federated Learning

Chen, Minghui, Ghoukasian, Hrad, Jin, Ruinan, Wang, Zehua, Karimireddy, Sai Praneeth, Li, Xiaoxiao

Federated Learning (FL) enables decentralized, privacy-preserving model training but struggles to balance global generalization and local personalization due to non-identical data distributions across clients. Personalized Fine-Tuning (PFT), a popular post-hoc solution, fine-tunes the final global model locally but often overfits to skewed client distributions or fails under domain shifts. We propose adapting Linear Probing followed by full Fine-Tuning (LP-FT), a principled centralized strategy for alleviating feature distortion (Kumar et al., 2022), to the FL setting. Through systematic evaluation across seven datasets and six PFT variants, we demonstrate LP-FT's superiority in balancing personalization and generalization. Our analysis uncovers federated feature distortion, a phenomenon where local fine-tuning destabilizes globally learned features, and theoretically characterizes how LP-FT mitigates this via phased parameter updates. We further establish conditions (e.g., partial feature overlap, covariate-concept shift) under which LP-FT outperforms standard fine-tuning, offering actionable guidelines for deploying robust personalization in FL.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

2511.12695

Country: North America > United States (0.67)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.68)
Health & Medicine > Diagnostic Medicine > Imaging (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsOct-10-2025, 22:21:19 GMT

Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective

classifier weight norm, lp-ft, matrix, (16 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsMay-27-2025, 21:57:11 GMT

Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective

fine-tuning language model, linear head norm, ntk perspective, (2 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.41)

Tomihari, Akiyoshi, Sato, Issei

Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective

arXiv.org Artificial IntelligenceMay-26-2024

The two-stage fine-tuning (FT) method, linear probing then fine-tuning (LP-FT), consistently outperforms linear probing (LP) and FT alone in terms of accuracy for both in-distribution (ID) and out-of-distribution (OOD) data. This success is largely attributed to the preservation of pre-trained features, achieved through a near-optimal linear head obtained during LP. However, despite the widespread use of large language models, the exploration of complex architectures such as Transformers remains limited. In this paper, we analyze the training dynamics of LP-FT for classification models on the basis of the neural tangent kernel (NTK) theory. Our analysis decomposes the NTK matrix into two components, highlighting the importance of the linear head norm alongside the prediction accuracy at the start of the FT stage. We also observe a significant increase in the linear head norm during LP, stemming from training with the cross-entropy (CE) loss, which effectively minimizes feature changes. Furthermore, we find that this increased norm can adversely affect model calibration, a challenge that can be addressed by temperature scaling. Additionally, we extend our analysis with the NTK to the low-rank adaptation (LoRA) method and validate its effectiveness. Our experiments with a Transformer-based model on natural language processing tasks across multiple benchmarks confirm our theoretical analysis and demonstrate the effectiveness of LP-FT in fine-tuning language models. Code is available at https://github.com/tom4649/lp-ft_ntk.

classifier weight norm, lp-ft, matrix, (16 more...)

2405.16747

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Wen, Ziting, Pizarro, Oscar, Williams, Stefan

Feature Alignment: Rethinking Efficient Active Learning via Proxy in the Context of Pre-trained Models

arXiv.org Artificial IntelligenceMar-2-2024

Fine-tuning the pre-trained model with active learning holds promise for reducing annotation costs. However, this combination introduces significant computational costs, particularly with the growing scale of pre-trained models. Recent research has proposed proxy-based active learning, which pre-computes features to reduce computational costs. Yet, this approach often incurs a significant loss in active learning performance, which may even outweigh the computational cost savings. In this paper, we argue the performance drop stems not only from pre-computed features' inability to distinguish between categories of labeled samples, resulting in the selection of redundant samples but also from the tendency to compromise valuable pre-trained information when fine-tuning with samples selected through the proxy model. To address this issue, we propose a novel method called aligned selection via proxy to update pre-computed features while selecting a proper training method to inherit valuable pre-training information. Extensive experiments validate that our method significantly improves the total cost of efficient active learning while maintaining computational efficiency.

active learning, active learning strategy, learning, (12 more...)

2403.01101

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

arXiv.org Artificial IntelligenceMay-18-2023

Learning to Generalize for Cross-domain QA

Niu, Yingjie, Yang, Linyi, Dong, Ruihai, Zhang, Yue

There have been growing concerns regarding the out-of-domain generalization ability of natural language processing (NLP) models, particularly in question-answering (QA) tasks. Current synthesized data augmentation methods for QA are hampered by increased training costs. To address this issue, we propose a novel approach that combines prompting methods and linear probing then fine-tuning strategy, which does not entail additional cost. Our method has been theoretically and empirically shown to be effective in enhancing the generalization ability of both generative and discriminative models. Our approach outperforms state-of-the-art baselines, with an average increase in F1 score of 4.5%-7.9%. Furthermore, our method can be easily integrated into any pre-trained models and offers a promising solution to the under-explored cross-domain QA task. We release our source code at GitHub*.

information retrieval, natural language, question answering, (21 more...)

2305.08208

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre:

Research Report > Promising Solution (0.54)
Research Report > New Finding (0.46)

Industry:

Consumer Products & Services > Restaurants (0.94)
Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.35)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.34)

arXiv.org Artificial IntelligenceApr-24-2023

Few-shot Fine-tuning is All You Need for Source-free Domain Adaptation

Lee, Suho, Seo, Seungwon, Kim, Jihyo, Lee, Yejin, Hwang, Sangheum

Recently, source-free unsupervised domain adaptation (SFUDA) has emerged as a more practical and feasible approach compared to unsupervised domain adaptation (UDA) which assumes that labeled source data are always accessible. However, significant limitations associated with SFUDA approaches are often overlooked, which limits their practicality in real-world applications. These limitations include a lack of principled ways to determine optimal hyperparameters and performance degradation when the unlabeled target data fail to meet certain requirements such as a closed-set and identical label distribution to the source data. All these limitations stem from the fact that SFUDA entirely relies on unlabeled target data. We empirically demonstrate the limitations of existing SFUDA methods in real-world scenarios including out-of-distribution and label distribution shifts in target data, and verify that none of these methods can be safely applied to real-world settings. Based on our experimental results, we claim that fine-tuning a source pretrained model with a few labeled data (e.g., 1- or 3-shot) is a practical and reliable solution to circumvent the limitations of SFUDA. Contrary to common belief, we find that carefully fine-tuned models do not suffer from overfitting even when trained with only a few labeled data, and also show little change in performance due to sampling bias. Our experimental results on various domain adaptation benchmarks demonstrate that the few-shot fine-tuning approach performs comparatively under the standard SFUDA settings, and outperforms comparison methods under realistic scenarios. Our code is available at https://github.com/daintlab/fewshot-SFDA .

artificial intelligence, machine learning, sfuda method, (19 more...)

2304.00792

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Asia > Middle East > Yemen > Amran Governorate > Amran (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

arXiv.org Artificial IntelligenceMar-28-2023

Parameter-Efficient Tuning Makes a Good Classification Head

Yang, Zhuoyi, Ding, Ming, Guo, Yanhui, Lv, Qingsong, Tang, Jie

In recent years, pretrained models revolutionized the paradigm of natural language understanding (NLU), where we append a randomly initialized classification head after the pretrained backbone, e.g. BERT, and finetune the whole model. As the pretrained backbone makes a major contribution to the improvement, we naturally expect a good pretrained classification head can also benefit the training. However, the final-layer output of the backbone, i.e. the input of the classification head, will change greatly during finetuning, making the usual head-only pretraining (LP-FT) ineffective. In this paper, we find that parameter-efficient tuning makes a good classification head, with which we can simply replace the randomly initialized heads for a stable performance gain. Our experiments demonstrate that the classification head jointly pretrained with parameter-efficient tuning consistently improves the performance on 9 tasks in GLUE and SuperGLUE.

arxiv preprint arxiv, machine learning, natural language, (18 more...)

2210.16771

Country:

North America > United States (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > China (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)