AITopics

Genre: Research Report > New Finding (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-9-2026, 10:14:48 GMT

95e62984b87e90645a5cf77037395959-Supplemental.pdf

finetuned model, influence function score, section 4, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Neural Information Processing SystemsFeb-9-2026, 10:14:41 GMT

95e62984b87e90645a5cf77037395959-Paper.pdf

influence function, influence function score, influence score, (16 more...)

Country:

Europe > United Kingdom (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Ohio (0.04)
(5 more...)

Genre: Research Report (0.93)

Industry: Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsDec-24-2025, 07:49:04 GMT

Multi-Stage Influence Function

Multi-stage training and knowledge transfer, from a large-scale pretraining task to various finetuning tasks, have revolutionized natural language processing and computer vision resulting in state-of-the-art performance improvements. In this paper, we develop a multi-stage influence function score to track predictions from a finetuned model all the way back to the pretraining data. With this score, we can identify the pretraining examples in the pretraining task that contribute most to a prediction in the finetuning task. The proposed multi-stage influence function generalizes the original influence function for a single model in (Koh & Liang, 2017), thereby enabling influence computation through both pretrained and finetuned models. We study two different scenarios with the pretrained embedding fixed or updated in the finetuning tasks. We test our proposed method in various experiments to show its effectiveness and potential applications.

electronic proceedings, multi-stage influence function, name change, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.99)
Information Technology > Artificial Intelligence > Machine Learning (0.65)

Raimondi, Bianca, Dalbagno, Daniela, Gabbrielli, Maurizio

Analysing Moral Bias in Finetuned LLMs through Mechanistic Interpretability

arXiv.org Artificial IntelligenceDec-8-2025

Large language models (LLMs) have been shown to internalize human-like biases during finetuning, yet the mechanisms by which these biases manifest remain unclear. In this work, we investigated whether the well-known Knobe effect, a moral bias in intentionality judgements, emerges in finetuned LLMs and whether it can be traced back to specific components of the model. We conducted a Layer-Patching analysis across 3 open-weights LLMs and demonstrated that the bias is not only learned during finetuning but also localized in a specific set of layers. Surprisingly, we found that patching activations from the corresponding pretrained model into just a few critical layers is sufficient to eliminate the effect. Our findings offer new evidence that social biases in LLMs can be interpreted, localized, and mitigated through targeted interventions, without the need for model retraining.

artificial intelligence, large language model, natural language, (16 more...)

2510.12229

Country:

Europe > Italy (0.14)
North America > United States (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceNov-11-2025

Multiple Streams of Knowledge Retrieval: Enriching and Recalling in Transformers

Nief, Todd, Reber, David, Richardson, Sean, Holtzman, Ari

When an LLM learns a new fact during finetuning (e.g., new movie releases, newly elected pope, etc.), where does this information go? Are entities enriched with relation information, or do models recall information just-in-time before a prediction? Or, are ``all of the above'' true with LLMs implementing multiple redundant heuristics? Existing localization approaches (e.g., activation patching) are ill-suited for this analysis because they usually \textit{replace} parts of the residual stream, thus overriding previous information. To fill this gap, we propose \emph{dynamic weight grafting}, a technique that selectively grafts weights from a finetuned model onto a pretrained model. Using this technique, we show two separate pathways for retrieving finetuned relation information: 1) ``enriching" the residual stream with relation information while processing the tokens that correspond to an entity (e.g., ``Zendaya'' in ``Zendaya co-starred with John David Washington'') and 2) ``recalling" this information at the final token position before generating a target fact. In some cases, models need information from both of these pathways to correctly generate finetuned facts while, in other cases, either the ``enrichment" or ``recall" pathway alone is sufficient. We localize the ``recall'' pathway to model components -- finding that ``recall" occurs via both task-specific attention mechanisms and an entity-specific extraction step in the feedforward networks of the final layers before the target prediction. By targeting model components and parameters, as opposed to just activations, we are able to understand the \textit{mechanisms} by which finetuned knowledge is retrieved during generation.

information, large language model, machine learning, (18 more...)

2506.20746

Country: Asia (0.46)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-29-2025

Accelerate Scaling of LLM Finetuning via Quantifying the Coverage and Depth of Instruction Set

Wu, Chengwei, Du, Li, Zhao, Hanyu, Ju, Yiming, Wang, Jiapu, Chen, Tianyu, Zhou, Haoyi

Scaling the amount of data used for supervied fine-tuning(SFT) does not guarantee the proportional gains in model performance, highlighting a critical need to understand what makes training samples effective. This work identifies two fundamental dataset properties that govern SFT scalability: \textbf{semantic coverage}, or the breadth of task domains, and \textbf{information depth}, or the richness of individual examples. We demonstrate that simple proxies for these properties explain the majority of validation loss variance in our experiments. In this work, we further propose the \textbf{Information Landscape Approximation (ILA)}, a model-agnostic data selection framework that jointly optimizes for these two factors. ILA constructs compact subsets that approximate the informational value of large datasets. Empirical results show that models tuned on ILA-selected data achieve faster and more sustained performance improvements across diverse tasks and model sizes compared to existing methods, a phenomenon we term \textbf{accelerated scaling}.

large language model, machine learning, natural language, (18 more...)

2509.06463

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Ahrens, Lara, Haverkamp, Wilhelm, Strodthoff, Nils

ECG-LLM -- training and evaluation of domain-specific large language models for electrocardiography

arXiv.org Artificial IntelligenceOct-22-2025

However, optimal adaptation strategies, evaluation methodologies, and performance relative to general-purpose LLMs remain poorly characterized. We investigated these questions in electrocardiography, an important area of cardiovascular medicine, by finetuning open-weight models on domain-specific literature and implementing a multi-layered evaluation framework comparing finetuned models, retrieval-augmented generation (RAG), and Claude Sonnet 3.7 as a representative general-purpose model. Finetuned Llama 3.1 70B achieved superior performance on multiple-choice evaluations and automatic text metrics, ranking second to Claude 3.7 in LLM-as-a-judge assessments. Human expert evaluation favored Claude 3.7 and RAG approaches for complex queries. Finetuned models significantly outperformed their base counterparts across nearly all evaluation modes. Our findings reveal substantial performance heterogeneity across evaluation methodologies, underscoring assessment complexity. Nevertheless, domain-specific adaptation through finetuning and RAG achieves competitive performance with proprietary models, supporting the viability of privacy-preserving, locally deployable clinical solutions.

large language model, machine learning, natural language, (20 more...)

2510.18339

Country:

North America (0.46)
Europe > Germany (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceSep-5-2025

Delta Activations: A Representation for Finetuned Large Language Models

Xu, Zhiqiu, Sethi, Amish, Naik, Mayur, Lim, Ser-Nam

The success of powerful open source Large Language Models (LLMs) has enabled the community to create a vast collection of post-trained models adapted to specific tasks and domains. However, navigating and understanding these models remains challenging due to inconsistent metadata and unstructured repositories. We introduce Delta Activations, a method to represent finetuned models as vector embeddings by measuring shifts in their internal activations relative to a base model. This representation allows for effective clustering by domain and task, revealing structure in the model landscape. Delta Activations also demonstrate desirable properties: it is robust across finetuning settings and exhibits an additive property when finetuning datasets are mixed. In addition, we show that Delta Activations can embed tasks via few-shot finetuning, and further explore its use for model selection and merging. We hope Delta Activations can facilitate the practice of reusing publicly available models. Code is available at https://github.com/OscarXZQ/delta_activations.

large language model, machine learning, natural language, (17 more...)