mitre
- North America > United States > New York > Monroe County > Rochester (0.04)
- Europe > Middle East (0.04)
- Asia > Middle East > Iran (0.04)
- (3 more...)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.48)
AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence
Alam, Md Tanvirul, Bhusal, Dipkamal, Ahmad, Salman, Rastogi, Nidhi, Worth, Peter
Large Language Models (LLMs) have demonstrated strong capabilities in natural language reasoning, yet their application to Cyber Threat Intelligence (CTI) remains limited. CTI analysis involves distilling large volumes of unstructured reports into actionable knowledge, a process where LLMs could substantially reduce analyst workload. CTIBench introduced a comprehensive benchmark for evaluating LLMs across multiple CTI tasks. In this work, we extend CTIBench by developing AthenaBench, an enhanced benchmark that includes an improved dataset creation pipeline, duplicate removal, refined evaluation metrics, and a new task focused on risk mitigation strategies. We evaluate twelve LLMs, including state-of-the-art proprietary models such as GPT-5 and Gemini-2.5 Pro, alongside seven open-source models from the LLaMA and Qwen families. While proprietary LLMs achieve stronger results overall, their performance remains subpar on reasoning-intensive tasks, such as threat actor attribution and risk mitigation, with open-source models trailing even further behind. These findings highlight fundamental limitations in the reasoning capabilities of current LLMs and underscore the need for models explicitly tailored to CTI workflows and automation.
- North America > United States > New York > Monroe County > Rochester (0.04)
- North America > United States > Florida > Palm Beach County > Jupiter (0.04)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
CTIArena: Benchmarking LLM Knowledge and Reasoning Across Heterogeneous Cyber Threat Intelligence
Cheng, Yutong, Liu, Yang, Li, Changze, Song, Dawn, Gao, Peng
Cyber threat intelligence (CTI) is central to modern cybersecurity, providing critical insights for detecting and mitigating evolving threats. With the natural language understanding and reasoning capabilities of large language models (LLMs), there is increasing interest in applying them to CTI, which calls for benchmarks that can rigorously evaluate their performance. Several early efforts have studied LLMs on some CTI tasks but remain limited: (i) they adopt only closed-book settings, relying on parametric knowledge without leveraging CTI knowledge bases; (ii) they cover only a narrow set of tasks, lacking a systematic view of the CTI landscape; and (iii) they restrict evaluation to single-source analysis, unlike realistic scenarios that require reasoning across multiple sources. To fill these gaps, we present CTIArena, the first benchmark for evaluating LLM performance on heterogeneous, multi-source CTI under knowledge-augmented settings. CTIArena spans three categories, structured, unstructured, and hybrid, further divided into nine tasks that capture the breadth of CTI analysis in modern security operations. We evaluate ten widely used LLMs and find that most struggle in closed-book setups but show noticeable gains when augmented with security-specific knowledge through our designed retrieval-augmented techniques. These findings highlight the limitations of general-purpose LLMs and the need for domain-tailored techniques to fully unlock their potential for CTI.
- Asia > Middle East > Iraq (0.28)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Cuba (0.04)
- (9 more...)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.48)
OntoLogX: Ontology-Guided Knowledge Graph Extraction from Cybersecurity Logs with Large Language Models
Cotti, Luca, Drago, Idilio, Rula, Anisa, Bianchini, Devis, Cerutti, Federico
System logs represent a valuable source of Cyber Threat Intelligence (CTI), capturing attacker behaviors, exploited vulnerabilities, and traces of malicious activity. Yet their utility is often limited by lack of structure, semantic inconsistency, and fragmentation across devices and sessions. Extracting actionable CTI from logs therefore requires approaches that can reconcile noisy, heterogeneous data into coherent and interoperable representations. We introduce OntoLogX, an autonomous Artificial Intelligence (AI) agent that leverages Large Language Models (LLMs) to transform raw logs into ontology-grounded Knowledge Graphs (KGs). OntoLogX integrates a lightweight log ontology with Retrieval Augmented Generation (RAG) and iterative correction steps, ensuring that generated KGs are syntactically and semantically valid. Beyond event-level analysis, the system aggregates KGs into sessions and employs a LLM to predict MITRE ATT&CK tactics, linking low-level log evidence to higher-level adversarial objectives. We evaluate OntoLogX on both logs from a public benchmark and a real-world honeypot dataset, demonstrating robust KG generation across multiple KGs backends and accurate mapping of adversarial activity to ATT&CK tactics. Results highlight the benefits of retrieval and correction for precision and recall, the effectiveness of code-oriented models in structured log analysis, and the value of ontology-grounded representations for actionable CTI extraction.
- Europe > United Kingdom > England > Hampshire > Southampton (0.14)
- Europe > Italy > Piedmont > Turin Province > Turin (0.04)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- (3 more...)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.66)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Real-Time RAG for the Identification of Supply Chain Vulnerabilities
Ponnock, Jesse, Kenneally, Grace, Briggs, Michael Robert, Yeo, Elinor, Patterson, Tyrone III, Kinberg, Nicholas, Kalinowski, Matthew, Hechtman, David
New technologies in generative AI can enable deeper analysis into our nation's supply chains but truly informative insights require the continual updating and aggregation of massive data in a timely manner. Large Language Models (LLMs) offer unprecedented analytical opportunities however, their knowledge base is constrained to the models' last training date, rendering these capabilities unusable for organizations whose mission impacts rely on emerging and timely information. This research proposes an innovative approach to supply chain analysis by integrating emerging Retrieval-Augmented Generation (RAG) preprocessing and retrieval techniques with advanced web-scraping technologies. Our method aims to reduce latency in incorporating new information into an augmented-LLM, enabling timely analysis of supply chain disruptors. Through experimentation, this study evaluates the combinatorial effects of these techniques towards timeliness and quality trade-offs. Our results suggest that in applying RAG systems to supply chain analysis, fine-tuning the embedding retrieval model consistently provides the most significant performance gains, underscoring the critical importance of retrieval quality. Adaptive iterative retrieval, which dynamically adjusts retrieval depth based on context, further enhances performance, especially on complex supply chain queries. Conversely, fine-tuning the LLM yields limited improvements and higher resource costs, while techniques such as downward query abstraction significantly outperforms upward abstraction in practice.
- Asia > China (0.04)
- North America > United States > Maryland > Montgomery County > Gaithersburg (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (3 more...)
- Law (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Banking & Finance (1.00)
- (2 more...)
ThreatGPT: An Agentic AI Framework for Enhancing Public Safety through Threat Modeling
Zisad, Sharif Noor, Hasan, Ragib
As our cities and communities become smarter, the systems that keep us safe, such as traffic control centers, emergency response networks, and public transportation, also become more complex. With this complexity comes a greater risk of security threats that can affect not just machines but real people's lives. To address this challenge, we present ThreatGPT, an agentic Artificial Intelligence (AI) assistant built to help people whether they are engineers, safety officers, or policy makers to understand and analyze threats in public safety systems. Instead of requiring deep cybersecurity expertise, it allows users to simply describe the components of a system they are concerned about, such as login systems, data storage, or communication networks. Then, with the click of a button, users can choose how they want the system to be analyzed by using popular frameworks such as STRIDE, MITRE ATT&CK, CVE reports, NIST, or CISA. ThreatGPT is unique because it does not just provide threat information, but rather it acts like a knowledgeable partner. Using few-shot learning, the AI learns from examples and generates relevant smart threat models. It can highlight what might go wrong, how attackers could take advantage, and what can be done to prevent harm. Whether securing a city's infrastructure or a local health service, this tool adapts to users' needs. In simple terms, ThreatGPT brings together AI and human judgment to make our public systems safer. It is designed not just to analyze threats, but to empower people to understand and act on them, faster, smarter, and with more confidence.
- North America > United States > Alabama > Jefferson County > Birmingham (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Government > Military > Cyberwarfare (0.52)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
KillChainGraph: ML Framework for Predicting and Mapping ATT&CK Techniques
Singh, Chitraksh, Dhanraj, Monisha, Huang, Ken
--The escalating complexity and volume of cyber-attacks demand proactive detection strategies that go beyond traditional rule-based systems. This paper presents a phase-aware, multi-model machine learning framework that emulates adversarial behavior across the seven phases of the Cyber Kill Chain using the MITRE A TT&CK Enterprise dataset. T ech-niques are semantically mapped to phases via A TT ACK-BERT, producing seven phase-specific datasets. We evaluate LightGBM, a custom Transformer encoder, fine-tuned BERT, and a Graph Neural Network (GNN), integrating their outputs through a weighted soft voting ensemble. Inter-phase dependencies are modeled using directed graphs to capture attacker movement from reconnaissance to objectives. The ensemble consistently achieved the highest scores, with F1-scores ranging from 97.47% to 99.83%, surpassing GNN performance (97.36% to 99.81%) by 0.03%-0.20% This graph-driven, ensemble-based approach enables interpretable attack path forecasting and strengthens proactive cyber defense.
- North America > United States (0.28)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- Asia > India > Maharashtra > Mumbai (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.49)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
CyGATE: Game-Theoretic Cyber Attack-Defense Engine for Patch Strategy Optimization
Jiang, Yuning, Oo, Nay, Meng, Qiaoran, Lin, Lu, Niyato, Dusit, Xiong, Zehui, Lim, Hoon Wei, Sikdar, Biplab
--Modern cyber attacks unfold through multiple stages, requiring defenders to dynamically prioritize mitigations under uncertainty. While game-theoretic models capture attacker-defender interactions, existing approaches often rely on static assumptions and lack integration with real-time threat intelligence, limiting their adaptability. This paper presents Cy-GATE, a game-theoretic framework modeling attacker-defender interactions, using large language models (LLMs) with retrieval-augmented generation (RAG) to enhance tactic selection and patch prioritization. Applied to a two-agent scenario, CyGATE frames cyber conflicts as a partially observable stochastic game (POSG) across Cyber Kill Chain stages. Both agents use belief states to navigate uncertainty, with the attacker adapting tactics and the defender re-prioritizing patches based on evolving risks and observed adversary behavior . The framework's flexible architecture enables extension to multi-agent scenarios involving coordinated attackers, collaborative defenders, or complex enterprise environments with multiple stakeholders. The evolving cybersecurity landscape presents increasingly sophisticated threats that necessitate adaptive, proactive defense strategies. Patch management, a cornerstone of cyber defense, requires intelligent prioritization of vulnerabilities under resource constraints such as maintenance windows and operational cost [1] [2] . However, traditional scoring systems like common vulnerability scoring system (CVSS) [3] fail to capture the evolving nature of cyber threats, where attackers adapt their strategies based on defender actions. Game theory provides a structured framework for modeling attacker-defender interactions [4], with chained or multistage games particularly suited to representing complex attack progressions along the Cyber Kill Chain (CKC) [5][6][7]. These models allow defenders to reason about long-term risks and preempt cascading compromises. Despite these advancements, existing models remain constrained by fixed strategies, static payoff structures, and minimal integration of threat intelligence, failing to dynamically prioritize vulnerabilities based on evolving exploitation trends [8]. Traditional game-theoretical approaches typically use predefined rules to analyze strategies, hence are limited in dynamic cyber environments where adversaries continuously adapt, operate under uncertainty, and employ unpredictable tactics [9].
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.91)
Towards a Multi-Agent Simulation of Cyber-attackers and Cyber-defenders Battles
Soulé, Julien, Jamont, Jean-Paul, Occello, Michel, Théron, Paul, Traonouez, Louis-Marie
As cyber-attacks show to be more and more complex and coordinated, cyber-defenders strategy through multi-agent approaches could be key to tackle against cyber-attacks as close as entry points in a networked system. This paper presents a Markovian modeling and implementation through a simulator of fighting cyber-attacker agents and cyber-defender agents deployed on host network nodes. It aims to provide an experimental framework to implement realistically based coordinated cyber-attack scenarios while assessing cyber-defenders dynamic organizations. We abstracted network nodes by sets of properties including agents' ones. Actions applied by agents model how the network reacts depending in a given state and what properties are to change. Collective choice of the actions brings the whole environment closer or farther from respective cyber-attackers and cyber-defenders goals. Using the simulator, we implemented a realistically inspired scenario with several behavior implementation approaches for cyber-defenders and cyber-attackers.
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
- Europe > France > Brittany > Ille-et-Vilaine > Rennes (0.04)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Communications > Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
On Technique Identification and Threat-Actor Attribution using LLMs and Embedding Models
Guru, Kyla, Moss, Robert J., Kochenderfer, Mykel J.
Attribution of cyber-attacks remains a complex but critical challenge for cyber defenders. Currently, manual extraction of behavioral indicators from dense forensic documentation causes significant attribution delays, especially following major incidents at the international scale. This research evaluates large language models (LLMs) for cyber-attack attribution based on behavioral indicators extracted from forensic documentation. We test OpenAI's GPT-4 and text-embedding-3-large for identifying threat actors' tactics, techniques, and procedures (TTPs) by comparing LLM-generated TTPs against human-generated data from MITRE ATT&CK Groups. Our framework then identifies TTPs from text using vector embedding search and builds profiles to attribute new attacks for a machine learning model to learn. Key contributions include: (1) assessing off-the-shelf LLMs for TTP extraction and attribution, and (2) developing an end-to-end pipeline from raw CTI documents to threat-actor prediction. This research finds that standard LLMs generate TTP datasets with noise, resulting in a low similarity to human-generated datasets. However, the TTPs generated are similar in frequency to those within the existing MITRE datasets. Additionally, although these TTPs are different than human-generated datasets, our work demonstrates that they still prove useful for training a model that performs above baseline on attribution. Project code and files are contained here: https://github.com/kylag/ttp_attribution.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (8 more...)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.69)