SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From
Tong, Yao, Wang, Haonan, Li, Siquan, Kawaguchi, Kenji, Hu, Tianyang
–arXiv.org Artificial Intelligence
Fingerprinting Large Language Models (LLMs) is essential for provenance verification and model attribution. LLM fingerprints have recently been proposed as a tool to identify, attribute, and trace LLMs by examining their observable behaviors (Pasquini et al., 2024; Xu et al., 2024; Y oon et al., 2025; Galton's Finger Prints (1892) (Galton, 1892): "A fingerprint is the pattern formed by friction-ridge skin on the fingertips; They are defined only after models have been fully trained and converged (finish pretaining), e.g., extracting patterns from parameters or generated text, and thus capture traits that emerge as the In this work, we propose a stricter notion of an LLM fingerprint: an intrinsic property present at model initialization and detectable at any time of the subsequent training. Extensive experiments in Section 5.1 show that our method can even distinguish between models that differ only in their initialization seed, despite an identical training pipeline and data order. Finally, in Section 5.2, we evaluate our method both Classic techniques include backdoor attacks (Adi et al., 2018; Nasery et al.); and model weight watermarks, which embed identifiers into parameters or link them With white-box access, signatures are extracted from model weights, leveraging intrinsic properties like the distribution of attention matrices (Y oon et al., 2025), the kernel alignment of internal representations (Zhang et al., 2024), or the stable direction of parameter vectors. Figure 1: Initialization-born token bias persists through training.
arXiv.org Artificial Intelligence
Oct-1-2025
- Genre:
- Research Report > Experimental Study (0.33)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: