incident response
AgentBnB: A Browser-Based Cybersecurity Tabletop Exercise with Large Language Model Support and Retrieval-Aligned Scaffolding
Abstract--Traditional cybersecurity tabletop exercises (TTXs) provide valuable training but are often scripted, resource-intensive, and difficult to scale. We introduce AgentBnB, a browser-based re-imagining of the Backdoors & Breaches game that integrates large language model teammates with a Bloom-aligned, retrieval-augmented copilot (C2D2). The system expands a curated corpus into factual, conceptual, procedural, and metacognitive snippets, delivering on-demand, cognitively targeted hints. Prompt-engineered agents employ a scaffolding ladder that gradually fades as learner confidence grows. In a solo-player pilot with four graduate students, participants reported greater intention to use the agent-based version compared to the physical card deck and viewed it as more scalable, though a ceiling effect emerged on a simple knowledge quiz. Despite limitations of small sample size, single-player focus, and narrow corpus, these early findings suggest that large language model augmented TTXs can provide lightweight, repeatable practice without the logistical burden of traditional exercises. Planned extensions include multi-player modes, telemetry-driven coaching, and comparative studies with larger cohorts. Cybersecurity tabletop exercises (TTXs) [1]-[3] have long served as a core method of incident-response training, giving teams structured environments to practice technical and organizational decision-making under pressure.
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.83)
- Education > Educational Setting > Higher Education (0.66)
- Education > Educational Technology > Educational Software > Computer Based Training (0.46)
AutoBnB-RAG: Enhancing Multi-Agent Incident Response with Retrieval-Augmented Generation
Incident response (IR) requires fast, coordinated, and well-informed decision-making to contain and mitigate cyber threats. While large language models (LLMs) have shown promise as autonomous agents in simulated IR settings, their reasoning is often limited by a lack of access to external knowledge. In this work, we present AutoBnB-RAG, an extension of the AutoBnB framework that incorporates retrieval-augmented generation (RAG) into multi-agent incident response simulations. Built on the Backdoors & Breaches (B&B) tabletop game environment, AutoBnB-RAG enables agents to issue retrieval queries and incorporate external evidence during collaborative investigations. We introduce two retrieval settings: one grounded in curated technical documentation (RAG-Wiki), and another using narrative-style incident reports (RAG-News). We evaluate performance across eight team structures, including newly introduced argumentative configurations designed to promote critical reasoning. To validate practical utility, we also simulate real-world cyber incidents based on public breach reports, demonstrating AutoBnB-RAG's ability to reconstruct complex multi-stage attacks. Our results show that retrieval augmentation improves decision quality and success rates across diverse organizational models. This work demonstrates the value of integrating retrieval mechanisms into LLM-based multi-agent systems for cybersecurity decision-making.
- Information Technology > Security & Privacy (1.00)
- Leisure & Entertainment > Games > Computer Games (0.48)
- Government > Military > Cyberwarfare (0.38)
Incident Response Planning Using a Lightweight Large Language Model with Reduced Hallucination
Hammar, Kim, Alpcan, Tansu, Lupu, Emil C.
Timely and effective incident response is key to managing the growing frequency of cyberattacks. However, identifying the right response actions for complex systems is a major technical challenge. A promising approach to mitigate this challenge is to use the security knowledge embedded in large language models (LLMs) to assist security operators during incident handling. Recent research has demonstrated the potential of this approach, but current methods are mainly based on prompt engineering of frontier LLMs, which is costly and prone to hallucinations. We address these limitations by presenting a novel way to use an LLM for incident response planning with reduced hallucination. Our method includes three steps: fine-tuning, information retrieval, and lookahead planning. We prove that our method generates response plans with a bounded probability of hallucination and that this probability can be made arbitrarily small at the expense of increased planning time under certain assumptions. Moreover, we show that our method is lightweight and can run on commodity hardware. We evaluate our method on logs from incidents reported in the literature. The experimental results show that our method a) achieves up to 22% shorter recovery times than frontier LLMs and b) generalizes to a broad range of incident types and response actions.
- North America > United States > California > San Diego County > San Diego (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (8 more...)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.66)
DFIR-Metric: A Benchmark Dataset for Evaluating Large Language Models in Digital Forensics and Incident Response
Cherif, Bilel, Bisztray, Tamas, Dubniczky, Richard A., Aldahmani, Aaesha, Alshehhi, Saeed, Tihanyi, Norbert
Digital Forensics and Incident Response (DFIR) involves analyzing digital evidence to support legal investigations. Large Language Models (LLMs) offer new opportunities in DFIR tasks such as log analysis and memory forensics, but their susceptibility to errors and hallucinations raises concerns in high-stakes contexts. Despite growing interest, there is no comprehensive benchmark to evaluate LLMs across both theoretical and practical DFIR domains. To address this gap, we present DFIR-Metric, a benchmark with three components: (1) Knowledge Assessment: a set of 700 expert-reviewed multiple-choice questions sourced from industry-standard certifications and official documentation; (2) Realistic Forensic Challenges: 150 CTF-style tasks testing multi-step reasoning and evidence correlation; and (3) Practical Analysis: 500 disk and memory forensics cases from the NIST Computer Forensics Tool Testing Program (CFTT). We evaluated 14 LLMs using DFIR-Metric, analyzing both their accuracy and consistency across trials. We also introduce a new metric, the Task Understanding Score (TUS), designed to more effectively evaluate models in scenarios where they achieve near-zero accuracy. This benchmark offers a rigorous, reproducible foundation for advancing AI in digital forensics. All scripts, artifacts, and results are available on the project website at https://github.com/DFIR-Metric .
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Norway > Eastern Norway > Oslo (0.04)
- (6 more...)
Multi-Agent Collaboration in Incident Response with Large Language Models
Incident response (IR) is a critical aspect of cybersecurity, requiring rapid decision-making and coordinated efforts to address cyberattacks effectively. Leveraging large language models (LLMs) as intelligent agents offers a novel approach to enhancing collaboration and efficiency in IR scenarios. This paper explores the application of LLM-based multi-agent collaboration using the Backdoors & Breaches framework, a tabletop game designed for cybersecurity training. We simulate real-world IR dynamics through various team structures, including centralized, decentralized, and hybrid configurations. By analyzing agent interactions and performance across these setups, we provide insights into optimizing multi-agent collaboration for incident response. Our findings highlight the potential of LLMs to enhance decision-making, improve adaptability, and streamline IR processes, paving the way for more effective and coordinated responses to cyber threats.
- Overview (1.00)
- Research Report > New Finding (0.34)
- Research Report > Promising Solution (0.34)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (1.00)
Exploring reinforcement learning for incident response in autonomous military vehicles
Madsen, Henrik, Grov, Gudmund, Mancini, Federico, Baksaas, Magnus, Sommervoll, Åvald Åslaugson
Unmanned vehicles able to conduct advanced operations without human intervention are being developed at a fast pace for many purposes. Not surprisingly, they are also expected to significantly change how military operations can be conducted. To leverage the potential of this new technology in a physically and logically contested environment, security risks are to be assessed and managed accordingly. Research on this topic points to autonomous cyber defence as one of the capabilities that may be needed to accelerate the adoption of these vehicles for military purposes. Here, we pursue this line of investigation by exploring reinforcement learning to train an agent that can autonomously respond to cyber attacks on unmanned vehicles in the context of a military operation. We first developed a simple simulation environment to quickly prototype and test some proof-of-concept agents for an initial evaluation. This agent was then applied to a more realistic simulation environment and finally deployed on an actual unmanned ground vehicle for even more realism. A key contribution of our work is demonstrating that reinforcement learning is a viable approach to train an agent that can be used for autonomous cyber defence on a real unmanned ground vehicle, even when trained in a simple simulation environment.
- Europe > Ukraine (0.05)
- Asia > Russia (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (2 more...)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.75)
Advancing Cyber Incident Timeline Analysis Through Rule Based AI and Large Language Models
Loumachi, Fatma Yasmine, Ghanem, Mohamed Chahine
Timeline Analysis (TA) plays a crucial role in Timeline Forensics (TF) within the field of Digital Forensics (DF). It focuses on examining and analyzing time-based digital artefacts, such as timestamps derived from event logs, file metadata, and other relevant data, to correlate events linked to cyber incidents and reconstruct their chronological sequence. Traditional tools often struggle to efficiently handle the large volume and variety of data generated during DF investigations and Incident Response (IR) processes. This paper introduces a novel framework, GenDFIR, which combines Rule-Based Artificial Intelligence (R-BAI) algorithms with Large Language Models (LLMs) to enhance and automate the TA process. The proposed approach consists of two key stages: (1) R-BAI is used to identify and select anomalous digital artefacts based on predefined rules. (2) The selected artefacts are then transformed into embeddings for processing by an LLM with the assistance of a Retrieval-Augmented Generation (RAG) agent. The LLM uses its capabilities to perform automated TA on the artefacts and predict potential incident outcomes. To validate the framework, we evaluated its performance, efficiency, and reliability. Several metrics were applied to simulated cyber incident scenarios, which were presented as forensic case documents. Our findings demonstrate the significant potential of integrating R-BAI and LLMs for TA. This innovative approach underscores the power of Generative AI (GenAI), particularly LLMs, and opens up new possibilities for advanced threat detection and incident reconstruction, marking a significant advancement in the field.
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > United Kingdom > England > Merseyside > Liverpool (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
IncidentResponseGPT: Generating Traffic Incident Response Plans with Generative Artificial Intelligence
Grigorev, Artur, Saleh, Adriana-Simona Mihaita Khaled, Ou, Yuming
The proposed IncidentResponseGPT framework - a novel system that applies generative artificial intelligence (AI) to potentially enhance the efficiency and effectiveness of traffic incident response. This model allows for synthesis of region-specific incident response guidelines and generates incident response plans adapted to specific area, aiming to expedite decision-making for traffic management authorities. This approach aims to accelerate incident resolution times by suggesting various recommendations (e.g. optimal rerouting strategies, estimating resource needs) to minimize the overall impact on the urban traffic network. The system suggests specific actions, including dynamic lane closures, optimized rerouting and dispatching appropriate emergency resources. IncidentResponseGPT employs the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) to rank generated response plans based on criteria like impact minimization and resource efficiency based on their proximity to an human-proposed solution.
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > New York > Onondaga County (0.04)
- North America > United States > Hawaii (0.04)
- North America > United States > Alaska (0.04)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Government (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.85)
Anomaly Detection for Incident Response at Scale
Wang, Hanzhang, Tangirala, Gowtham Kumar, Naidu, Gilkara Pranav, Mayville, Charles, Roy, Arighna, Sun, Joanne, Mandava, Ramesh Babu
We present a machine learning-based anomaly detection product, AI Detect and Respond (AIDR), that monitors Walmart's business and system health in real-time. During the validation over 3 months, the product served predictions from over 3000 models to more than 25 application, platform, and operation teams, covering 63\% of major incidents and reducing the mean-time-to-detect (MTTD) by more than 7 minutes. Unlike previous anomaly detection methods, our solution leverages statistical, ML and deep learning models while continuing to incorporate rule-based static thresholds to incorporate domain-specific knowledge. Both univariate and multivariate ML models are deployed and maintained through distributed services for scalability and high availability. AIDR has a feedback loop that assesses model quality with a combination of drift detection algorithms and customer feedback. It also offers self-onboarding capabilities and customizability. AIDR has achieved success with various internal teams with lower time to detection and fewer false positives than previous methods. As we move forward, we aim to expand incident coverage and prevention, reduce noise, and integrate further with root cause recommendation (RCR) to enable an end-to-end AIDR experience.
- North America > United States > California > San Diego County > San Diego (0.05)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
- North America > United States > California > Santa Clara County > Sunnyvale (0.05)
- (3 more...)
- Research Report (0.50)
- Workflow (0.46)
A Comprehensive Analysis of the Role of Artificial Intelligence and Machine Learning in Modern Digital Forensics and Incident Response
Dunsin, Dipo, Ghanem, Mohamed C., Ouazzane, Karim, Vassilev, Vassil
In the dynamic landscape of digital forensics, the integration of Artificial Intelligence (AI) and Machine Learning (ML) stands as a transformative technology, poised to amplify the efficiency and precision of digital forensics investigations. However, the use of ML and AI in digital forensics is still in its nascent stages. As a result, this paper gives a thorough and in-depth analysis that goes beyond a simple survey and review. The goal is to look closely at how AI and ML techniques are used in digital forensics and incident response. This research explores cutting-edge research initiatives that cross domains such as data collection and recovery, the intricate reconstruction of cybercrime timelines, robust big data analysis, pattern recognition, safeguarding the chain of custody, and orchestrating responsive strategies to hacking incidents. This endeavour digs far beneath the surface to unearth the intricate ways AI-driven methodologies are shaping these crucial facets of digital forensics practice. While the promise of AI in digital forensics is evident, the challenges arising from increasing database sizes and evolving criminal tactics necessitate ongoing collaborative research and refinement within the digital forensics profession. This study examines the contributions, limitations, and gaps in the existing research, shedding light on the potential and limitations of AI and ML techniques. By exploring these different research areas, we highlight the critical need for strategic planning, continual research, and development to unlock AI's full potential in digital forensics and incident response. Ultimately, this paper underscores the significance of AI and ML integration in digital forensics, offering insights into their benefits, drawbacks, and broader implications for tackling modern cyber threats.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Africa > Nigeria (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (9 more...)
- Research Report > Promising Solution (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Overview > Innovation (0.87)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.47)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Data Science > Data Mining > Big Data (0.89)