Iannacone, Michael D.
A Comprehensive Guide to CAN IDS Data & Introduction of the ROAD Dataset
Verma, Miki E., Bridges, Robert A., Iannacone, Michael D., Hollifield, Samuel C., Moriano, Pablo, Hespeler, Steven C., Kay, Bill, Combs, Frank L.
Although ubiquitous in modern vehicles, Controller Area Networks (CANs) lack basic security properties and are easily exploitable. A rapidly growing field of CAN security research has emerged that seeks to detect intrusions on CANs. Producing vehicular CAN data with a variety of intrusions is out of reach for most researchers as it requires expensive assets and expertise. To assist researchers, we present the first comprehensive guide to the existing open CAN intrusion datasets, including a quality analysis of each dataset and an enumeration of each's benefits, drawbacks, and suggested use case. Current public CAN IDS datasets are limited to real fabrication (simple message injection) attacks and simulated attacks often in synthetic data, which lack fidelity. In general, the physical effects of attacks on the vehicle are not verified in the available datasets. Only one dataset provides signal-translated data but not a corresponding raw binary version. Overall, the available data pigeon-holes CAN IDS works into testing on limited, often inappropriate data (usually with attacks that are too easily detectable to truly test the method), and this lack data has stymied comparability and reproducibility of results. As our primary contribution, we present the ROAD (Real ORNL Automotive Dynamometer) CAN Intrusion Dataset, consisting of over 3.5 hours of one vehicle's CAN data. ROAD contains ambient data recorded during a diverse set of activities, and attacks of increasing stealth with multiple variants and instances of real fuzzing, fabrication, and unique advanced attacks, as well as simulated masquerade attacks. To facilitate benchmarking CAN IDS methods that require signal-translated inputs, we also provide the signal time series format for many of the CAN captures. Our contributions aim to facilitate appropriate benchmarking and needed comparability in the CAN IDS field.
Beyond the Hype: A Real-World Evaluation of the Impact and Cost of Machine Learning-Based Malware Detection
Bridges, Robert A., Oesch, Sean, Verma, Miki E., Iannacone, Michael D., Huffer, Kelly M. T., Jewell, Brian, Nichols, Jeff A., Weber, Brian, Beaver, Justin M., Smith, Jared M., Scofield, Daniel, Miles, Craig, Plummer, Thomas, Daniell, Mark, Tall, Anne M.
Attackers use malicious software, known as malware, to steal sensitive data, damage network infrastructure, and hold information for ransom. One of the top priorities for computer security tools is to detect malware and prevent or minimize its impact on both corporate and personal networks. Traditionally, signature-based methods have been used to detect files previously identified as malicious with near perfect precision, but potentially miss newer malware samples. With the advent of self-modifying malware and the rapid increase in novel threats, signature-based methods are insufficient on their own. By generalizing patterns of known benign/malicious training examples, machine learning (ML) exhibits the capability to quickly and accurately classify novel file samples in many research studies [19]. Moreover, ML-based malware research has made the transition from the subject of myriad research efforts to a current mainstay of commercial-off-the-shelf (COTS) malware detectors. Yet, few practical evaluations of COTS ML-based technologies have been conducted. Turning from the academic literature to market reports from commercial companies can provide (for a fee) useful information, specifically, end-user feedback, itemization of all technologies in the antivirus/endpoint detection and response marketplace [17], and even statistics showing the efficacy of the detectors on malware tests [4, 40].
GraphPrints: Towards a Graph Analytic Method for Network Anomaly Detection
Harshaw, Christopher R., Bridges, Robert A., Iannacone, Michael D., Reed, Joel W., Goodall, John R.
This paper introduces a novel graph-analytic approach for detecting anomalies in network flow data called GraphPrints. Building on foundational network-mining techniques, our method represents time slices of traffic as a graph, then counts graphlets -- small induced subgraphs that describe local topology. By performing outlier detection on the sequence of graphlet counts, anomalous intervals of traffic are identified, and furthermore, individual IPs experiencing abnormal behavior are singled-out. Initial testing of GraphPrints is performed on real network data with an implanted anomaly. Evaluation shows false positive rates bounded by 2.84% at the time-interval level, and 0.05% at the IP-level with 100% true positive rates at both.