Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models
–Neural Information Processing Systems
Since the rapid development of Large Language Models (LLMs) has achieved remarkable success, understanding and rectifying their internal complex mechanisms has become an urgent issue. Recent research has attempted to interpret their behaviors through the lens of inner representation. However, developing practical and efficient methods for applying these representations for general and flexible model editing remains challenging.
Neural Information Processing Systems
Oct-10-2025, 19:39:42 GMT
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Information Technology (0.46)
- Government (0.46)
- Education (0.46)
- Technology: