Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs

Open in new window