Detecting Cyberattacks in Industrial Control Systems Using Online Learning Algorithms

arXiv.org Machine Learning

Industrial control systems are critical to the operation of industrial facilities, especially for critical infrastructures, such as refineries, power grids, and transportation systems. Similar to other information systems, a significant threat to industrial control systems is the attack from cyberspace---the offensive maneuvers launched by "anonymous" in the digital world that target computer-based assets with the goal of compromising a system's functions or probing for information. Owing to the importance of industrial control systems, and the possibly devastating consequences of being attacked, significant endeavors have been attempted to secure industrial control systems from cyberattacks. Among them are intrusion detection systems that serve as the first line of defense by monitoring and reporting potentially malicious activities. Classical machine-learning-based intrusion detection methods usually generate prediction models by learning modest-sized training samples all at once. Such approach is not always applicable to industrial control systems, as industrial control systems must process continuous control commands with limited computational resources in a nonstop way. To satisfy such requirements, we propose using online learning to learn prediction models from the controlling data stream. We introduce several state-of-the-art online learning algorithms categorically, and illustrate their efficacies on two typically used testbeds---power system and gas pipeline. Further, we explore a new cost-sensitive online learning algorithm to solve the class-imbalance problem that is pervasive in industrial intrusion detection systems. Our experimental results indicate that the proposed algorithm can achieve an overall improvement in the detection rate of cyberattacks in industrial control systems.


Bayesian Network Models for Generation of Crisis Management Training Scenarios

AAAI Conferences

We present a noisy-OR Bayesian network model for simulation-based training, and an efficient search-based algorithm for automatic synthesis of plausible training scenarios from constraint specifications. This randomized algorithm for approximate causal inference is shown to outperform other randomized methods, such as those based on perturbation of the maximally plausible scenario. It has the added advantage of being able to generate acceptable scenarios (based on a maximum penalized likelihood criterion) faster than human subject matter experts, and with greater diversity than deterministic inference. We describe a field-tested interactive training system for crisis management and show how our model can be applied offline to produce scenario specifications. We then evaluate the performance of our automatic scenario generator and compare its results to those achieved by human instructors, stochastic simulation, and maximum likelihood inference. Finally, we discuss the applicability of our system and framework to a broader range of modeling problems for computer-assisted instruction.


Two Key Ways Intelligent Automation is Changing the Face of Cybersecurity Ayehu

#artificialintelligence

Artificial intelligence and machine learning technologies are being integrated into many aspects of our everyday lives. If you use Siri or Amazon Echo, you've already been touched by AI to some degree. One area where this so-called "smart" technology has become particularly valuable is in the realm of cybersecurity. But despite the buzz, it's important to understand the real capabilities of intelligent automation in security.


Machine learning in cybersecurity: How to evaluate offerings

#artificialintelligence

The answer, much like the outputs derived through machine learning algorithms, is neither black nor white. The promise of machine learning in cybersecurity lies in its ability to detect as-yet-unknown threats, particularly those that may lurk in networks for long periods of time seeking their ultimate goals. Machine learning technology does this by distinguishing atypical from typical behavior, while noting and correlating a great number of simultaneous events and data points. But in order to know what constitutes typical activity on a website, endpoint or network at any given time, the machine learning algorithms must be trained on large volumes of data that have already been properly labelled, identified or categorized with distinguishing features that can be assigned and reassigned relative weights. While this may sound logical, machine learning technology is a darker black box than most.