Soft Token Attacks Cannot Reliably Audit Unlearning in Large Language Models

Open in new window