Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models

Oct-10-2025, 19:39:42 GMT–Neural Information Processing Systems

Since the rapid development of Large Language Models (LLMs) has achieved remarkable success, understanding and rectifying their internal complex mechanisms has become an urgent issue. Recent research has attempted to interpret their behaviors through the lens of inner representation. However, developing practical and efficient methods for applying these representations for general and flexible model editing remains challenging.

discriminator, language model, representation, (14 more...)

Neural Information Processing Systems

Oct-10-2025, 19:39:42 GMT

Conferences PDF

Add feedback

Country:
- Europe > Romania
  - Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Asia
  - Singapore (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)
  - China
    - Zhejiang Province > Hangzhou (0.04)
    - Beijing > Beijing (0.04)

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Information Technology (0.46)
- Government (0.46)
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
e4630f7c0660d944c132455c124e7d90-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found