CRAKEN: Cybersecurity LLM Agent with Knowledge-Based Execution

Shao, Minghao, Xi, Haoran, Rani, Nanda, Udeshi, Meet, Putrevu, Venkata Sai Charan, Milner, Kimberly, Dolan-Gavitt, Brendan, Shukla, Sandeep Kumar, Krishnamurthy, Prashanth, Khorrami, Farshad, Karri, Ramesh, Shafique, Muhammad

May-26-2025–arXiv.org Artificial Intelligence

Large Language Model (LLM) agents can automate cybersecurity tasks and can adapt to the evolving cybersecurity landscape without re-engineering. While LLM agents have demonstrated cybersecurity capabilities on Capture-The-Flag (CTF) competitions, they have two key limitations: accessing latest cybersecurity expertise beyond training data, and integrating new knowledge into complex task planning. Knowledge-based approaches that incorporate technical understanding into the task-solving automation can tackle these limitations. We present CRAKEN, a knowledge-based LLM agent framework that improves cybersecurity capability through three core mechanisms: contextual decomposition of task-critical information, iterative self-reflected knowledge retrieval, and knowledge-hint injection that transforms insights into adaptive attack strategies. Comprehensive evaluations with different configurations show CRAKEN's effectiveness in multi-stage vulnerability detection and exploitation compared to previous approaches. Our extensible architecture establishes new methodologies for embedding new security knowledge into LLM-driven cybersecurity agentic systems. With a knowledge database of CTF writeups, CRAKEN obtained an accuracy of 22% on NYU CTF Bench, outperforming prior works by 3% and achieving state-of-the-art results. On evaluation of MITRE ATT&CK techniques, CRAKEN solves 25-30% more techniques than prior work, demonstrating improved cybersecurity capabilities via knowledge-based execution. We make our framework open source to public https://github.com/NYU-LLM-CTF/nyuctf_agents_craken.

knowledge management, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

May-26-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.28)
- Asia > Middle East (0.28)

Genre:
- Research Report (1.00)

Industry:
- Information Technology > Security & Privacy (1.00)
- Government > Military
  - Cyberwarfare (1.00)

Technology:
- Information Technology
  - Security & Privacy (1.00)
  - Knowledge Management > Knowledge Engineering (1.00)
  - Artificial Intelligence
    - Representation & Reasoning > Expert Systems (1.00)
    - Natural Language > Large Language Model (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found