Uncovering the Vulnerability of Large Language Models in the Financial Domain via Risk Concealment
Cheng, Gang, Jin, Haibo, Zhang, Wenbin, Wang, Haohan, Zhuang, Jun
–arXiv.org Artificial Intelligence
Large Language Models (LLMs) are increasingly integrated into financial applications, yet existing red-teaming research primarily targets harmful content, largely neglecting regulatory risks. In this work, we aim to investigate the vulnerability of financial LLMs through red-teaming approaches. We introduce Risk-Concealment Attacks (RCA), a novel multi-turn framework that iteratively conceals regulatory risks to provoke seemingly compliant yet regulatory-violating responses from LLMs. To enable systematic evaluation, we construct FIN-Bench, a domain-specific benchmark for assessing LLM safety in financial contexts. Extensive experiments on FIN-Bench demonstrate that RCA effectively bypasses nine mainstream LLMs, achieving an average attack success rate (ASR) of 93.18%, including 98.28% on GPT-4.1 and 97.56% on OpenAI o1. These findings reveal a critical gap in current alignment techniques and underscore the urgent need for stronger moderation mechanisms in financial domains. We hope this work offers practical insights for advancing robust and domain-aware LLM alignment.
arXiv.org Artificial Intelligence
Sep-16-2025
- Country:
- Europe
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Switzerland > Basel-City
- Basel (0.04)
- Middle East > Malta
- North America > United States
- Florida > Hillsborough County
- University (0.04)
- Idaho > Ada County
- Boise (0.04)
- Illinois > Champaign County
- Florida > Hillsborough County
- Europe
- Genre:
- Research Report > New Finding (0.93)
- Industry:
- Banking & Finance > Trading (1.00)
- Government (1.00)
- Information Technology > Security & Privacy (1.00)
- Technology: