HuRef: HUman-REadable Fingerprint for Large Language Models
–Neural Information Processing Systems
However, identifying the original base model of an LLM is challenging due to potential parameter alterations. In this study, we introduce HuRef, a humanreadable fingerprint for LLMs that uniquely identifies the base model without interfering with training or exposing model parameters to the public. We first observe that the vector direction of LLM parameters remains stable after the model has converged during pretraining, with negligible perturbations through subsequent training steps, including continued pretraining, supervised fine-tuning, and RLHF, which makes it a sufficient condition to identify the base model. The necessity is validated by continuing to train an LLM with an extra term to drive away the model parameters' direction and the model becomes damaged. However, this direction is vulnerable to simple attacks like dimension permutation or matrix rotation, which significantly change it without affecting performance.
Neural Information Processing Systems
Mar-27-2025, 12:37:33 GMT
- Country:
- Asia (0.28)
- North America > United States
- Hawaii (0.14)
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: