ICLScan: Detecting Backdoors in Black-Box Large Language Models via Targeted In-context Illumination

Open in new window