AITopics | weight change

Collaborating Authors

weight change

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness

Neural Information Processing SystemsMar-20-2026, 07:43:35 GMT

The security threat of backdoor attacks is a central concern for deep neural networks (DNNs). Recently, without poisoned data, unlearning models with clean data and then learning a pruning mask have contributed to backdoor defense. Additionally, vanilla fine-tuning with those clean data can help recover the lost clean accuracy. However, the behavior of clean unlearning is still under-explored, and vanilla fine-tuning unintentionally induces back the backdoor effect. In this work, we first investigate model unlearning from the perspective of weight changes and gradient norms, and find two interesting observations in the backdoored model: 1) the weight changes between poison and clean unlearning are positively correlated, making it possible for us to identify the backdoored-related neurons without using poisoned data; 2) the neurons of the backdoored model are more active (, larger gradient norm) than those in the clean model, suggesting the need to suppress the gradient norm during fine-tuning. Then, we propose an effective two-stage defense method. In the first stage, an efficient is proposed based on observation 1). In the second stage, based on observation 2), we design an to replace the vanilla fine-tuning. Extensive experiments, involving eight backdoor attacks on three benchmark datasets, demonstrate the superior performance of our proposed method compared to recent state-of-the-art backdoor defense approaches.

artificial intelligence, machine learning, proceedings, (10 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)

Add feedback

4a3ffc1a0074901718ab9dcef41909e9-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 13:07:13 GMT

artificial intelligence, machine learning, neuron, (20 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.92)

Add feedback

f80ebff16ccaa9b48a0224d7c489cef4-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 23:55:53 GMT

activation, algorithm, convolutional layer, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Neural Information Processing SystemsFeb-11-2026, 00:54:08 GMT

On the other hand, recent finding on neural tangent kernel enables us to approximate a wide neural network with a linear model of the network's random features. In this paper, we theoretically analyze the knowledge distillation of a wide neural network. First we provide a transfer risk bound for the linearized model of the network. Then we propose a metric of the task's training difficulty, called data inefficiency.

artificial intelligence, machine learning, student, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Supplementary information 1 Simulation parameters

Neural Information Processing SystemsFeb-9-2026, 12:27:43 GMT

All simulations were based on pytorch [5]. For the nonlinear neuroscience tasks, we applied the gradient descent method "Adam" [4] to the recurrent weights W as well as to the input and output vectors mi, wi. We checked that our results did not depend qualitatively on the choice of the "Adam" algorithm over plain gradient descent; however, training converged more easily for this choice of algorithm. We also checked that restricting training to W only (as for the simple model) did not alter our results qualitatively (although, with this restriction, training on the Romo task for small values of g did not converge). Code for reproducing our results can be found on https://github.com/frschu/neurips_

artificial intelligence, decision period, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.74)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Adapting Large Language Models to Low-Resource Tibetan: A Two-Stage Continual and Supervised Fine-Tuning Study

Chen, Lifeng, Lai, Ryan, Liu, Tianming

arXiv.org Artificial IntelligenceDec-4-2025

Adapting large language models (LLMs) to low-resource languages remains a major challenge due to data scarcity and cross-lingual drift. This work presents a two-stage adaptation of Qwen2.5-3B to Tibetan, a morphologically rich and underrepresented language. We employ Continual Pretraining (CPT) to establish Tibetan linguistic grounding, followed by Supervised Fine-Tuning (SFT) for task and translation specialization. Empirical evaluations demonstrate a consistent decrease in perplexity (from 2.98 $\rightarrow$ 1.54) and substantial improvements in Chinese$\rightarrow$Tibetan translation quality (BLEU: 0.046 $\rightarrow$ 0.261; chrF: 2.2 $\rightarrow$ 6.6). Layer-wise analysis across 435 layers in Qwen3-4B reveals that adaptation primarily concentrates on embedding and output heads, with mid--late MLP projections encoding domain-specific transformations. Our findings suggest that CPT constructs a Tibetan semantic manifold while SFT sharpens task alignment with minimal representational disruption. This study provides the first quantitative exploration of Tibetan adaptation dynamics for LLMs, and offers an open, reproducible framework for extending multilingual foundation models to low-resource settings.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2512.03976

Genre: Research Report > New Finding (0.68)

Technology: