BAM-ICL: Causal Hijacking In-Context Learning with Budgeted Adversarial Manipulation
–Neural Information Processing Systems
Recent research shows that large language models (LLMs) are vulnerable to hijacking attacks under the scenario of in-context learning (ICL) where LLMs demonstrate impressive capabilities in performing tasks by conditioning on a sequence of in-context examples (ICEs) (i.e., prompts with task-specific input-output pairs). Adversaries can manipulate the provided ICEs to steer the model toward attackerspecified outputs, effectively "hijacking" the model's decision-making process. Unlike traditional adversarial attacks targeting single inputs, hijacking attacks in LLMs aim to subtly manipulate the initial few examples to influence the model's behavior across a range of subsequent inputs, which requires distributed and stealthy perturbations. However, existing approaches overlook how to effectively allocate the perturbation budget across ICEs. We argue that fixed budgets miss the potential of dynamic reallocation to improve attack success while maintaining high stealthiness and text quality.
Neural Information Processing Systems
Jun-15-2026, 01:16:38 GMT
- Country:
- North America > United States (1.00)
- Asia (1.00)
- Europe (0.93)
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Research Report
- Industry:
- Leisure & Entertainment (1.00)
- Law Enforcement & Public Safety > Terrorism (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- Technology: