BAM-ICL: Causal Hijacking In-Context Learning with Budgeted Adversarial Manipulation

Jun-15-2026, 01:16:38 GMT–Neural Information Processing Systems

Recent research shows that large language models (LLMs) are vulnerable to hijacking attacks under the scenario of in-context learning (ICL) where LLMs demonstrate impressive capabilities in performing tasks by conditioning on a sequence of in-context examples (ICEs) (i.e., prompts with task-specific input-output pairs). Adversaries can manipulate the provided ICEs to steer the model toward attackerspecified outputs, effectively "hijacking" the model's decision-making process. Unlike traditional adversarial attacks targeting single inputs, hijacking attacks in LLMs aim to subtly manipulate the initial few examples to influence the model's behavior across a range of subsequent inputs, which requires distributed and stealthy perturbations. However, existing approaches overlook how to effectively allocate the perturbation budget across ICEs. We argue that fixed budgets miss the potential of dynamic reallocation to improve attack success while maintaining high stealthiness and text quality.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Jun-15-2026, 01:16:38 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (1.00)
- Asia (1.00)
- Europe (0.93)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Leisure & Entertainment (1.00)
- Law Enforcement & Public Safety > Terrorism (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found