seqfm
Sequence-Aware Factorization Machines for Temporal Predictive Analytics
Chen, Tong, Yin, Hongzhi, Nguyen, Quoc Viet Hung, Peng, Wen-Chih, Li, Xue, Zhou, Xiaofang
--In various web applications like targeted advertising and recommender systems, the available categorical features (e.g., product type) are often of great importance but sparse. As a widely adopted solution, models based on Factorization Machines (FMs) are capable of modelling high-order interactions among features for effective sparse predictive analytics. As the volume of web-scale data grows exponentially over time, sparse predictive analytics inevitably involves dynamic and sequential features. However, existing FMbased models assume no temporal orders in the data, and are unable to capture the sequential dependencies or patterns within the dynamic features, impeding the performance and adaptivity of these methods. Hence, in this paper, we propose a novel Sequence-A ware Factorization Machine (SeqFM) for temporal predictive analytics, which models feature interactions by fully investigating the effect of sequential dependencies. As static features (e.g., user gender) and dynamic features (e.g., user interacted items) express different semantics, we innovatively devise a multi-view self-attention scheme that separately models the effect of static features, dynamic features and the mutual interactions between static and dynamic features in three different views. In SeqFM, we further map the learned representations of feature interactions to the desired output with a shared residual network. T o showcase the versatility and generalizability of SeqFM, we test SeqFM in three popular application scenarios for FMbased models, namely ranking, classification and regression tasks. Extensive experimental results on six large-scale datasets demonstrate the superior effectiveness and efficiency of SeqFM. As an important supervised learning scheme, predictive analytics play a pivotal role in various applications, ranging from recommender systems [1], [2] to financial analysis [3] and online advertising [4], [5]. In practice, the goal of predictive analytics is to learn a mapping function from the observed variables (i.e., features) to the desired output. When dealing with categorical features in predictive analytics, a common approach is to convert such features into one-hot encodings [6]-[8] so that standard regressors like logistic regression [9] and support vector machines [10] can be directly applied. Due to the large number of possible category variables, the converted one-hot features are usually of high dimensionality but sparse [11], and simply using raw features rarely provides optimal results. The interactions among multiple raw features are usually termed as cross features [7] (a.k.a.