Learning Hawkes Processes from a handful of events
Farnood Salehi, William Trouleau, Matthias Grossglauser, Patrick Thiran
Learning the causal-interaction network of multivariate Hawkes processes is a useful task in many applications. Maximum-likelihood estimation is the most common approach to solve the problem in the presence of long observation sequences. However, when only short sequences are available, the lack of data amplifies the risk of overfitting and regularization becomes critical. Due to the challenges of hyper-parameter tuning, state-of-the-art methods only parameterize regularizers by a single shared hyper-parameter, hence limiting the power of representation of the model. To solve both issues, we develop in this work an efficient algorithm based on variational expectation-maximization. Our approach is able to optimize over an extended set of hyper-parameters. It is also able to take into account the uncertainty in the model parameters by learning a posterior distribution over them. Experimental results on both synthetic and real datasets show that our approach significantly outperforms state-of-the-art methods under short observation sequences.
8744cf92c88433f8cb04a02e6db69a0d-AuthorFeedback.pdf
We thank the reviewers for the detailed and insightful reviews. As the reviewers noted, our work 1) contributes to "a Thank you for the valuable feedback on this section -- we will incorporate this in our next revision. The intuition for the proof of Theorem 3.3 is that the optimization problem is convex over the space of probability By weak regularization, we refer to the fact that ฮป 0 for our Theorem 4.1 to hold. The difficulty with ReLU networks is that if the gradient flow pushes neurons towards 0, issues of differentiability arise. One potential approach to circumvent this issue is arguing that with correct initialization, the iterates will never reach 0. This is an interesting direction for future work and we thank the reviewer for this suggestion.
873be0705c80679f2c71fbf4d872df59-AuthorFeedback.pdf
We thank the reviewers for their constructive comments. We address the main concerns below. In our implementation, it was crucial to use the improvements from Sec. 3.4. We ran the "positive response" version of ApproPO (Algorithm 5) for 2000 outer-loop iterations (i.e., 2000 updates of ฮป), but needed to make at most 61 RL Note that the policy mixture returned by ApproPO is just a weighted combination of the policies from cache. We will add this discussion to the paper and also update plots, so they are in terms of transitions rather than trajectories.
86e8f7ab32cfd12577bc2619bc635690-AuthorFeedback.pdf
We thank the reviewers for their valuable comments, and are happy to see feedback such as "concepts presented The reviewers agree that NOX maps are "definitely novel We answer questions, address factual errors, and present more details to improve our manuscript. First, we address R1's questions and concerns. The differences in chairs between Table 4 and 5 are due to different experimental setting. "Fixed Multi" models were trained with 2, 3, or 5 views respectively. R2 questions the claims about feature-level aggregation in Tables 2 and 3. Our main claim is that adding more views We will add this detail in the paper.