Abusing Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

Open in new window