Bellman Meets Hawkes: Model-Based Reinforcement Learning via Temporal Point Processes

Qu, Chao, Tan, Xiaoyu, Xue, Siqiao, Shi, Xiaoming, Zhang, James, Mei, Hongyuan

arXiv.org Artificial Intelligence 

The last several years have witnessed the great success of reinforcement learning (RL) including the video game playing [Mnih et al., 2015], robot manipulation [Gu et al., 2017], autonomous driving [Shalev-Shwartz et al., 2016] and many others [Lazic et al., 2018, Dalal et al., 2016]. Most of them focus on the problem where the system of interest evolves continuously with time, e.g., a trajectory of a tennis ball. However, the conventional research in RL may omit a category of system that evolves continuously and may be interrupted by stochastic events abruptly (see the jumps in Figure 1). Such system exists ubiquitously in the social and information science and therefore necessitates the research of reinforcement learning in these domains to extend its applicability in the real-world problems [Farajtabar et al., 2017, Wang et al., 2018], in which the agent seeks an optimal intervention policy so as to improve the future course of events. Concrete examples may include: - Social media. Social media website allows users to create and share content. Retweet can form as users resharing and broadcasting others' tweet to their friends and followers. Such stochastic events would steer the behaviors of other tweet users [Rizoiu et al., 2017]. At the same time, the platform (agent) may want to seek a policy to effectively mitigate the fake news by optimizing the performance of real news propagation over the network Farajtabar et al. [2017].