How Brittle is Agent Safety? Rethinking Agent Risk under Intent Concealment and Task Complexity

Ma, Zihan, Zhu, Dongsheng, Liu, Shudong, Zhang, Taolin, Liu, Junnan, Li, Qingqiu, Luo, Minnan, Zhang, Songyang, Chen, Kai

Nov-12-2025–arXiv.org Artificial Intelligence

Current safety evaluations for LLM-driven agents primarily focus on atomic harms, failing to address sophisticated threats where malicious intent is concealed or diluted within complex tasks. We address this gap with a two-dimensional analysis of agent safety brittleness under the orthogonal pressures of intent concealment and task complexity. To enable this, we introduce OASIS (Orthogonal Agent Safety Inquiry Suite), a hierarchical benchmark with fine-grained annotations and a high-fidelity simulation sandbox. Our findings reveal two critical phenomena: safety alignment degrades sharply and predictably as intent becomes obscured, and a "Complexity Paradox" emerges, where agents seem safer on harder tasks only due to capability limitations. By releasing OASIS and its simulation environment, we provide a principled foundation for probing and strengthening agent safety in these overlooked dimensions.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Nov-12-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.05)

Genre:
- Research Report (0.71)

Industry:
- Information Technology > Security & Privacy (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.52)
  - Natural Language > Large Language Model (0.96)