SAEMark: Steering Personalized Multilingual LLM Watermarks with Sparse Autoencoders

Neural Information Processing Systems 

Watermarking LLM-generated text is critical for content attribution and misinformation prevention, yet existing methods compromise text quality and require white-box model access with logit manipulation or training, which exclude API-based models and multilingual scenarios.