audit log
Autoregressive Language Models For Estimating the Entropy of Epic EHR Audit Logs
Warner, Benjamin C., Kannampallil, Thomas, Kim, Seunghwan
EHR audit logs are a highly granular stream of events that capture clinician activities, and is a significant area of interest for research in characterizing clinician workflow on the electronic health record (EHR). Existing techniques to measure the complexity of workflow through EHR audit logs (audit logs) involve time- or frequency-based cross-sectional aggregations that are unable to capture the full complexity of a EHR session. We briefly evaluate the usage of transformer-based tabular language model (tabular LM) in measuring the entropy or disorderedness of action sequences within workflow and release the evaluated models publicly.
MAGIC: Detecting Advanced Persistent Threats via Masked Graph Representation Learning
Jia, Zian, Xiong, Yun, Nan, Yuhong, Zhang, Yao, Zhao, Jinjing, Wen, Mi
Advance Persistent Threats (APTs), adopted by most delicate attackers, are becoming increasing common and pose great threat to various enterprises and institutions. Data provenance analysis on provenance graphs has emerged as a common approach in APT detection. However, previous works have exhibited several shortcomings: (1) requiring attack-containing data and a priori knowledge of APTs, (2) failing in extracting the rich contextual information buried within provenance graphs and (3) becoming impracticable due to their prohibitive computation overhead and memory consumption. In this paper, we introduce MAGIC, a novel and flexible self-supervised APT detection approach capable of performing multi-granularity detection under different level of supervision. MAGIC leverages masked graph representation learning to model benign system entities and behaviors, performing efficient deep feature extraction and structure abstraction on provenance graphs. By ferreting out anomalous system behaviors via outlier detection methods, MAGIC is able to perform both system entity level and batched log level APT detection. MAGIC is specially designed to handle concept drift with a model adaption mechanism and successfully applies to universal conditions and detection scenarios. We evaluate MAGIC on three widely-used datasets, including both real-world and simulated attacks. Evaluation results indicate that MAGIC achieves promising detection results in all scenarios and shows enormous advantage over state-of-the-art APT detection approaches in performance overhead.
Zebra: Deeply Integrating System-Level Provenance Search and Tracking for Efficient Attack Investigation
Yang, Xinyu, Liu, Haoyuan, Wang, Ziyu, Gao, Peng
However, a key limitation is that their DSLs can only search for events that are located within a close subgraph neighborhood. System auditing has emerged as a key approach for monitoring Thus, these approaches cannot efficiently reveal faraway system call events and investigating sophisticated attacks. Based on events on a long-range attack sequence, which is observed in many the collected audit logs, research has proposed to search for attack of the attacks these days due to their sophisticated, multi-stage patterns or track the causal dependencies of system events to reveal nature [55]. Tracking-based approaches assume causal dependencies the attack sequence. However, existing approaches either cannot between system entities that are involved in the same system reveal long-range attack sequences or suffer from the dependency event (e.g., a process reading a file) [45, 48, 52, 54]. Based on this explosion problem due to a lack of focus on attack-relevant parts, assumption, these approaches track the dependencies from a Point and thus are insufficient for investigating complex attacks. of Interest (POI) event (e.g., an alert event like the creation of a To bridge the gap, we propose Zebra, a system that synergistically suspicious file) and construct a system dependency graph, in which integrates attack pattern search and causal dependency tracking nodes represent system entities and edges represent system events.
How to Decide on a Dataset for Detecting Cyber-Attacks
You create an amazing machine learning algorithm. You take a novel approach and apply techniques that prove to be highly accurate. Your results demonstrate a very high true positive rate and a very low false positive rate. You write a paper that articulates your outstanding results and submit it to a leading academic conference. You expect that this research will be well received, and you will receive many citations of your work.
EXTRACTOR: Extracting Attack Behavior from Threat Reports
Satvat, Kiavash, Gjomemo, Rigel, Venkatakrishnan, V. N.
The knowledge on attacks contained in Cyber Threat Intelligence (CTI) reports is very important to effectively identify and quickly respond to cyber threats. However, this knowledge is often embedded in large amounts of text, and therefore difficult to use effectively. To address this challenge, we propose a novel approach and tool called EXTRACTOR that allows precise automatic extraction of concise attack behaviors from CTI reports. EXTRACTOR makes no strong assumptions about the text and is capable of extracting attack behaviors as provenance graphs from unstructured text. We evaluate EXTRACTOR using real-world incident reports from various sources as well as reports of DARPA adversarial engagements that involve several attack campaigns on various OS platforms of Windows, Linux, and FreeBSD. Our evaluation results show that EXTRACTOR can extract concise provenance graphs from CTI reports and show that these graphs can successfully be used by cyber-analytics tools in threat-hunting.