Adaptive Honeypot Engagement through Reinforcement Learning of Semi-Markov Decision Processes
–arXiv.org Artificial Intelligence
The honeynet is a promising active cyber defense mechanism. It reveals the fundamental Indicators of Compromise (IoC) by luring attackers to conduct adversarial behaviors in a controlled and monitored environment. The active interaction at the honeynet brings a high reward but also introduces high implementation costs and risks of adversarial honeynet exploitation. In this work, we apply the infinite-horizon Semi-Markov Decision Process (SMDP) to characterize the stochastic transition and sojourn time of attackers in the honeynet and quantify the reward-risk trade-off. In particular, we produce adaptive long-term engagement policies shown to be risk-averse, cost-effective, and time-efficient. Numerical results have demonstrated that our adaptive interaction policies can quickly attract attackers to the target honeypot and engage them for a sufficiently long period to obtain worthy threat information. Meanwhile, the penetration probability is kept at a low level. The results show that the expected utility is robust against attackers of a large range of persistence and intelligence. Finally, we apply reinforcement learning to SMDP to solve the curse of modeling. Under a prudent choice of the learning rate and exploration policy, we achieve a quick and robust convergence of the optimal policy and value.
arXiv.org Artificial Intelligence
Jun-26-2019
- Country:
- North America > United States (0.46)
- Genre:
- Research Report > New Finding (0.54)
- Industry:
- Information Technology > Security & Privacy (1.00)