Model Editing for Vision Transformers

Jun-17-2026, 18:22:08 GMT–Neural Information Processing Systems

Model editing offers a promising paradigm for efficiently and precisely updating knowledge in pre-trained transformers without costly retraining. While extensively studied in language models (LMs), model editing for vision transformers (ViTs) remains underexplored. Existing methods typically adapt LM-based techniques by modifying the multi-layer perceptron (MLP) modules, overlooking the unique characteristics of ViTs. In this work, we show that ViT predictions are more strongly influenced by the multi-head self-attention (MSA) modules than by the MLPs. Building on this observation, we propose a twostage framework for editing ViTs. First, we identify which attention heads are most responsible for incorrect predictions. Next, we selectively remove the corresponding features to correct the model's prediction. To further balance error correction with predictive stability on unrelated data, we learn a projection matrix that refines the image representations. Extensive experiments across multiple real-world datasets and model editing benchmarks demonstrate that our method consistently outperforms existing model editing methods for ViTs, achieving superior generalization and locality.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Jun-17-2026, 18:22:08 GMT

Conferences PDF

Add feedback

Country:
- North America (0.28)
- Asia (0.28)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Perceptrons (0.54)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found