Online Cyber-Attack Detection in Smart Grid: A Reinforcement Learning Approach

Kurt, Mehmet Necip, Ogundijo, Oyetunji, Li, Chong, Wang, Xiaodong

arXiv.org Machine Learning 

Early detection of cyber-attacks is crucial for a safe and reliable operation of the smart grid. In the literature, outlier detection schemes making sample-by-sample decisions and online detection schemes requiring perfect attack models have been proposed. In this paper, we formulate the online attack/anomaly detection problem as a partially observable Markov decision process (POMDP) problem and propose a universal robust online detection algorithm using the framework of model-free reinforcement learning (RL) for POMDPs. Numerical studies illustrate the effectiveness of the proposed RL-based algorithm in timely and accurate detection of cyber-attacks targeting the smart grid. A. Background and Related W ork The next generation power grid, i.e., the smart grid, relies on advanced control and communication technologies. This critical cyber infrastructure makes the smart grid vulnerable to hostile cyber-attacks [1]-[3]. Main objective of attackers is to damage/mislead the state estimation mechanism in the smart grid to cause wide-area power blackouts or to manipulate electricity market prices [4]. There are many types of cyber-attacks, among them false data injection (FDI), jamming, and denial of service (DoS) attacks are well known. FDI attacks add malicious fake data to meter measurements [5]-[8], jamming attacks corrupt meter measurements via additive noise [9], and DoS attacks block the access of system to meter measurements [8], [10], [11]. The smart grid is a complex network and any failure or anomaly in a part of the system may lead to huge damages on the overall system in a short period of time. Hence, early detection of cyber-attacks is critical for a timely and effective response. In this context, the framework of quickest change detection [12]-[15] is quite useful. In the quickest change detection problems, a change occurs in the sensing environment at an unknown time and the aim is to detect the change as soon as possible with the minimal level of false alarms based on the measurements that become available sequentially over time. After obtaining measurements at a given time, decision maker either declares a change or waits for the next time interval to have further measurements.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found