Learning Conditional Generative Models for Temporal Point Processes
Xiao, Shuai (Shanghai Jiao Tong University) | Xu, Hongteng (Duke University) | Yan, Junchi (Shanghai Jiao Tong University) | Farajtabar, Mehrdad (Georgia Institute of Technology) | Yang, Xiaokang (Shanghai Jiao Tong University) | Song, Le (Georgia Institute of Technology) | Zha, Hongyuan (Georgia Institute of Technology)
Our learning method is based on the following two facts: On one hand, MLE loss or KL divergence requires strict The ability of looking into the future is a challenging but luring matching between two probability distributions and is nonbiased task. People are willing to estimate the occurrence probability estimation of parameters, which is sensitive to sample for their interested events so that they can take preemptive noise and outliers; on the other hand, unlike MLE loss, action. For example, after reviewing the admission which does not consider how close two samples are but only history of patients, the doctors may give kind warning for the their relatively probability, Wasserstein distance is sensitive patients who are at high risk of certain diseases. When having to the underlying geometry structure of samples but has biased access to working experience of job seekers, headhunters gradients(Bellemare et al. 2017). To take advantage of can evaluate one's future career path and recommend a suitable the strengths of these two methods and mitigate the bias position at proper time. In these cases, the historical observations exposure in long-term prediction, our method incorporate always provide us with important guidance to predict Wasserstein distance besides MLE -- both the KL divergence future events -- not only the order of events but also the and the Wasserstein distance between generated and time span between them contain useful information about real samples are minimized jointly.
Feb-8-2018