Goto

Collaborating Authors

 threat intelligence


Scalable Hierarchical AI-Blockchain Framework for Real-Time Anomaly Detection in Large-Scale Autonomous Vehicle Networks

Shit, Rathin Chandra, Subudhi, Sharmila

arXiv.org Artificial Intelligence

Purpose: The security of autonomous vehicle networks is facing major challenges, owing to the complexity of sensor integration, real-time performance demands, and distributed communication protocols that expose vast attack surfaces around both individual and network-wide safety. Existing security schemes are unable to provide sub-10 ms (milliseconds) anomaly detection and distributed coordination of large-scale networks of vehicles within an acceptable safety/privacy framework. Method: This paper introduces a three-tier hybrid security architecture HA VEN (Hierarchical Autonomous Vehicle Enhanced Network), which decouples real-time local threat detection and distributed coordination operations. It incorporates a light ensemble anomaly detection model on the edge (first layer), Byzantine-fault-tolerant federated learning to aggregate threat intelligence at a regional scale (middle layer), and selected blockchain mechanisms (top layer) to ensure critical security coordination. Result: Extensive experimentation is done on a real-world autonomous driving dataset. Large-scale simulations with the number of vehicles ranging between 100 and 1000 and different attack types, such as sensor spoofing, jamming, and adversarial model poisoning, are conducted to test the scalability and resiliency of HA VEN. Conclusion: The proposed framework overcomes the important tradeoff between real-time safety obligation and distributed security coordination with novel three-tiered processing. The scalable architecture of HAVEN is shown to provide great improvement in detection accuracy as well as network resilience over other methods. Introduction The unprecedented rise of autonomous vehicles (A V) has altered the transport industry by providing unparalleled connectivity, intelligence, and an accessible transportation medium to people. These vehicles process mul-timodal sensor data collected from LiDAR, cameras, radar, and GPS/IMU sensors networked on a CAN (Controller Area Network) bus, generating terabytes of data in a day that require real-time analysis for safe operation [1, 2, 3]. However, the automotive systems are marred by state-of-the-art security threats characterized by safety-critical domains. Further, the distributed nature of vehicular networks poses significantly higher computational complexity and inherent difficulties while developing efficient cyberse-curity frameworks for detecting wide-array of threats in real-time.


Exploratory Analysis of Cyberattack Patterns on E-Commerce Platforms Using Statistical Methods

Adeniya, Fatimo Adenike

arXiv.org Artificial Intelligence

Cyberattacks on e-commerce platforms have grown in sophistication, threatening consumer trust and operational continuity. This research presents a hybrid analytical framework that integrates statistical modelling and machine learning for detecting and forecasting cyberattack patterns in the e-commerce domain. Using the Verizon Community Data Breach (VCDB) dataset, the study applies Auto ARIMA for temporal forecasting and significance testing, including a Mann-Whitney U test (U = 2579981.5, p = 0.0121), which confirmed that holiday shopping events experienced significantly more severe cyberattacks than non-holiday periods. ANOVA was also used to examine seasonal variation in threat severity, while ensemble machine learning models (XGBoost, LightGBM, and CatBoost) were employed for predictive classification. Results reveal recurrent attack spikes during high-risk periods such as Black Friday and holiday seasons, with breaches involving Personally Identifiable Information (PII) exhibiting elevated threat indicators. Among the models, CatBoost achieved the highest performance (accuracy = 85.29%, F1 score = 0.2254, ROC AUC = 0.8247). The framework uniquely combines seasonal forecasting with interpretable ensemble learning, enabling temporal risk anticipation and breach-type classification. Ethical considerations, including responsible use of sensitive data and bias assessment, were incorporated. Despite class imbalance and reliance on historical data, the study provides insights for proactive cybersecurity resource allocation and outlines directions for future real-time threat detection research.


AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence

Alam, Md Tanvirul, Bhusal, Dipkamal, Ahmad, Salman, Rastogi, Nidhi, Worth, Peter

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated strong capabilities in natural language reasoning, yet their application to Cyber Threat Intelligence (CTI) remains limited. CTI analysis involves distilling large volumes of unstructured reports into actionable knowledge, a process where LLMs could substantially reduce analyst workload. CTIBench introduced a comprehensive benchmark for evaluating LLMs across multiple CTI tasks. In this work, we extend CTIBench by developing AthenaBench, an enhanced benchmark that includes an improved dataset creation pipeline, duplicate removal, refined evaluation metrics, and a new task focused on risk mitigation strategies. We evaluate twelve LLMs, including state-of-the-art proprietary models such as GPT-5 and Gemini-2.5 Pro, alongside seven open-source models from the LLaMA and Qwen families. While proprietary LLMs achieve stronger results overall, their performance remains subpar on reasoning-intensive tasks, such as threat actor attribution and risk mitigation, with open-source models trailing even further behind. These findings highlight fundamental limitations in the reasoning capabilities of current LLMs and underscore the need for models explicitly tailored to CTI workflows and automation.


SBASH: a Framework for Designing and Evaluating RAG vs. Prompt-Tuned LLM Honeypots

Adebimpe, Adetayo, Neukirchen, Helmut, Welsh, Thomas

arXiv.org Artificial Intelligence

Honeypots are decoy systems used for gathering valuable threat intelligence or diverting attackers away from production systems. Maximising attacker engagement is essential to their utility. However research has highlighted that context-awareness, such as the ability to respond to new attack types, systems and attacker agents, is necessary to increase engagement. Large Language Models (LLMs) have been shown as one approach to increase context awareness but suffer from several challenges including accuracy and timeliness of response time, high operational costs and data-protection issues due to cloud deployment. We propose the System-Based Attention Shell Honeypot (SBASH) framework which manages data-protection issues through the use of lightweight local LLMs. We investigate the use of Retrieval Augmented Generation (RAG) supported LLMs and non-RAG LLMs for Linux shell commands and evaluate them using several different metrics such as response time differences, realism from human testers, and similarity to a real system calculated with Levenshtein distance, SBert, and BertScore. We show that RAG improves accuracy for untuned models while models that have been tuned via a system prompt that tells the LLM to respond like a Linux system achieve without RAG a similar accuracy as untuned with RAG, while having a slightly lower latency.


TITAN: Graph-Executable Reasoning for Cyber Threat Intelligence

Simoni, Marco, Fontana, Aleksandar, Saracino, Andrea, Mori, Paolo

arXiv.org Artificial Intelligence

TITAN (Threat Intelligence Through Automated Navigation) is a framework that connects natural-language cyber-threat queries with executable reasoning over a structured knowledge graph. It integrates a path-planner model, which predicts logical relation chains from text, and a graph executor that traverses the TITAN Ontology to retrieve factual answers and supporting evidence. Unlike traditional retrieval systems, TITAN operates on a typed, bidirectional graph derived from MITRE ATT&CK, allowing reasoning to move clearly and reversibly between threats, behaviors, and defenses. To support training and evaluation, we introduce the TITAN Dataset, a corpus of 88,209 examples (Train: 74,258; Test: 13,951) pairing natural-language questions with executable reasoning paths and step-by-step Chain-of-Thought explanations. Empirical evaluations show that TITAN enables models to generate syntactically valid and semantically coherent reasoning paths that can be deterministically executed on the underlying graph.


SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence

Aghaei, Ehsan, Jain, Sarthak, Arun, Prashanth, Sambamoorthy, Arjun

arXiv.org Artificial Intelligence

Effective analysis of cybersecurity and threat intelligence data demands language models that can interpret specialized terminology, complex document structures, and the interdependence of natural language and source code. Encoder-only transformer architectures provide efficient and robust representations that support critical tasks such as semantic search, technical entity extraction, and semantic analysis, which are key to automated threat detection, incident triage, and vulnerability assessment. However, general-purpose language models often lack the domain-specific adaptation required for high precision. We present SecureBERT 2.0, an enhanced encoder-only language model purpose-built for cybersecurity applications. Leveraging the ModernBERT architecture, SecureBERT 2.0 introduces improved long-context modeling and hierarchical encoding, enabling effective processing of extended and heterogeneous documents, including threat reports and source code artifacts. Pretrained on a domain-specific corpus more than thirteen times larger than its predecessor, comprising over 13 billion text tokens and 53 million code tokens from diverse real-world sources, SecureBERT 2.0 achieves state-of-the-art performance on multiple cybersecurity benchmarks. Experimental results demonstrate substantial improvements in semantic search for threat intelligence, semantic analysis, cybersecurity-specific named entity recognition, and automated vulnerability detection in code within the cybersecurity domain.


Uncovering Vulnerabilities of LLM-Assisted Cyber Threat Intelligence

Meng, Yuqiao, Tang, Luoxi, Yu, Feiyang, Jia, Jinyuan, Yan, Guanhua, Yang, Ping, Xi, Zhaohan

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are intensively used to assist security analysts in counteracting the rapid exploitation of cyber threats, wherein LLMs offer cyber threat intelligence (CTI) to support vulnerability assessment and incident response. While recent work has shown that LLMs can support a wide range of CTI tasks such as threat analysis, vulnerability detection, and intrusion defense, significant performance gaps persist in practical deployments. In this paper, we investigate the intrinsic vulnerabilities of LLMs in CTI, focusing on challenges that arise from the nature of the threat landscape itself rather than the model architecture. Using large-scale evaluations across multiple CTI benchmarks and real-world threat reports, we introduce a novel categorization methodology that integrates stratification, autoregressive refinement, and human-in-the-loop supervision to reliably analyze failure instances. Through extensive experiments and human inspections, we reveal three fundamental vulnerabilities: spurious correlations, contradictory knowledge, and constrained generalization, that limit LLMs in effectively supporting CTI. Subsequently, we provide actionable insights for designing more robust LLM-powered CTI systems to facilitate future research.


Benchmarking LLM-Assisted Blue Teaming via Standardized Threat Hunting

Meng, Yuqiao, Tang, Luoxi, Yu, Feiyang, Li, Xi, Yan, Guanhua, Yang, Ping, Xi, Zhaohan

arXiv.org Artificial Intelligence

As cyber threats continue to grow in scale and sophistication, blue team defenders increasingly require advanced tools to proactively detect and mitigate risks. Large Language Models (LLMs) offer promising capabilities for enhancing threat analysis. However, their effectiveness in real-world blue team threat-hunting scenarios remains insufficiently explored. This paper presents CyberTeam, a benchmark designed to guide LLMs in blue teaming practice. CyberTeam constructs a standardized workflow in two stages. First, it models realistic threat-hunting workflows by capturing the dependencies among analytical tasks from threat attribution to incident response. Next, each task is addressed through a set of operational modules tailored to its specific analytical requirements. This transforms threat hunting into a structured sequence of reasoning steps, with each step grounded in a discrete operation and ordered according to task-specific dependencies. Guided by this framework, LLMs are directed to perform threat-hunting tasks through modularized steps. Overall, CyberTeam integrates 30 tasks and 9 operational modules to guide LLMs through standardized threat analysis. We evaluate both leading LLMs and state-of-the-art cybersecurity agents, comparing CyberTeam against open-ended reasoning strategies. Our results highlight the improvements enabled by standardized design, while also revealing the limitations of open-ended reasoning in real-world threat hunting.


CTI Dataset Construction from Telegram

Arikkat, Dincy R., T., Sneha B., Nicolazzo, Serena, Nocera, Antonino, P., Vinod, A., Rafidha Rehiman K., R, Karthika

arXiv.org Artificial Intelligence

Cyber Threat Intelligence (CTI) has become indispensable for security analysts, enabling them to identify, collect, manage, and disseminate information on vulnerabilities and attacks, and to respond proactively to emerging threats [6]. Within the CTI lifecycle, data collection encompassing sources such as security alerts and threat intelligence reports from the web represents a critical foundational stage [3]. In this context, one challenge is that not all threat intelligence is published in standard CTI databases or integrated into commercial security platforms. V aluable CTI is often disseminated through unstructured channels such as blogs, social media posts, or reports from security companies and independent experts. To capture these dispersed insights, multiple online sources can be leveraged as early signals of emerging cyber threats. Information gathering thus becomes the first and most critical step, enabling the collection of relevant data on newly discovered vulnerabilities, active exploits, security alerts, threat intelligence reports, and security tool configurations. Curating CTI datasets requires addressing key challenges, including data sourcing from heterogeneous streams, ensuring data reliability, preserving privacy, and mitigating bias. A well-designed CTI dataset not only accelerates the advancement of automated threat intelligence systems but also strengthens global cyber defense capabilities through knowledge sharing and standardized evaluation frameworks. While platforms like Twitter [20] have been widely explored for their CTI potential, other communication ecosystems remain underexamined.


Cross-Service Threat Intelligence in LLM Services using Privacy-Preserving Fingerprints

Gill, Waris, Isak, Natalie, Dressman, Matthew

arXiv.org Artificial Intelligence

The widespread deployment of LLMs across enterprise services has created a critical security blind spot. Organizations operate multiple LLM services handling billions of queries daily, yet regulatory compliance boundaries prevent these services from sharing threat intelligence about prompt injection attacks, the top security risk for LLMs. When an attack is detected in one service, the same threat may persist undetected in others for months, as privacy regulations prohibit sharing user prompts across compliance boundaries. We present BinaryShield, the first privacy-preserving threat intelligence system that enables secure sharing of attack fingerprints across compliance boundaries. BinaryShield transforms suspicious prompts through a unique pipeline combining PII redaction, semantic embedding, binary quantization, and randomized response mechanism to potentially generate non-invertible fingerprints that preserve attack patterns while providing privacy. Our evaluations demonstrate that BinaryShield achieves an F1-score of 0.94, significantly outperforming SimHash (0.77), the privacy-preserving baseline, while achieving 64x storage reduction and 38x faster similarity search compared to dense embeddings.