Goto

Collaborating Authors

 Wen, Hong


Addressing Information Loss and Interaction Collapse: A Dual Enhanced Attention Framework for Feature Interaction

arXiv.org Artificial Intelligence

The Transformer has proven to be a significant approach in feature interaction for CTR prediction, achieving considerable success in previous works. However, it also presents potential challenges in handling feature interactions. Firstly, Transformers may encounter information loss when capturing feature interactions. By relying on inner products to represent pairwise relationships, they compress raw interaction information, which can result in a degradation of fidelity. Secondly, due to the long-tail features distribution, feature fields with low information-abundance embeddings constrain the information abundance of other fields, leading to collapsed embedding matrices. To tackle these issues, we propose a Dual Attention Framework for Enhanced Feature Interaction, known as Dual Enhanced Attention. This framework integrates two attention mechanisms: the Combo-ID attention mechanism and the collapse-avoiding attention mechanism. The Combo-ID attention mechanism directly retains feature interaction pairs to mitigate information loss, while the collapse-avoiding attention mechanism adaptively filters out low information-abundance interaction pairs to prevent interaction collapse. Extensive experiments conducted on industrial datasets have shown the effectiveness of Dual Enhanced Attention.


Modeling User Viewing Flow Using Large Language Models for Article Recommendation

arXiv.org Artificial Intelligence

This paper proposes the User Viewing Flow Modeling (SINGLE) method for the article recommendation task, which models the user constant preference and instant interest from user-clicked articles. Specifically, we first employ a user constant viewing flow modeling method to summarize the user's general interest to recommend articles. In this case, we utilize Large Language Models (LLMs) to capture constant user preferences from previously clicked articles, such as skills and positions. Then we design the user instant viewing flow modeling method to build interactions between user-clicked article history and candidate articles. It attentively reads the representations of user-clicked articles and aims to learn the user's different interest views to match the candidate article. Our experimental results on the Alibaba Technology Association (ATA) website show the advantage of SINGLE, achieving a 2.4% improvement over previous baseline models in the online A/B test. Our further analyses illustrate that SINGLE has the ability to build a more tailored recommendation system by mimicking different article viewing behaviors of users and recommending more appropriate and diverse articles to match user interests.


Hierarchically Fusing Long and Short-Term User Interests for Click-Through Rate Prediction in Product Search

arXiv.org Artificial Intelligence

Estimating Click-Through Rate (CTR) is a vital yet challenging task in personalized product search. However, existing CTR methods still struggle in the product search settings due to the following three challenges including how to more effectively extract users' short-term interests with respect to multiple aspects, how to extract and fuse users' long-term interest with short-term interests, how to address the entangling characteristic of long and short-term interests. To resolve these challenges, in this paper, we propose a new approach named Hierarchical Interests Fusing Network (HIFN), which consists of four basic modules namely Short-term Interests Extractor (SIE), Long-term Interests Extractor (LIE), Interests Fusion Module (IFM) and Interests Disentanglement Module (IDM). Specifically, SIE is proposed to extract user's short-term interests by integrating three fundamental interests encoders within it namely query-dependent, target-dependent and causal-dependent interest encoder, respectively, followed by delivering the resultant representation to the module LIE, where it can effectively capture user long-term interests by devising an attention mechanism with respect to the short-term interests from SIE module. In IFM, the achieved long and short-term interests are further fused in an adaptive manner, followed by concatenating it with original raw context features for the final prediction result. Last but not least, considering the entangling characteristic of long and short-term interests, IDM further devises a self-supervised framework to disentangle long and short-term interests. Extensive offline and online evaluations on a real-world e-commerce platform demonstrate the superiority of HIFN over state-of-the-art methods.


MOEF: Modeling Occasion Evolution in Frequency Domain for Promotion-Aware Click-Through Rate Prediction

arXiv.org Artificial Intelligence

Promotions are becoming more important and prevalent in e-commerce to attract customers and boost sales, leading to frequent changes of occasions, which drives users to behave differently. In such situations, most existing Click-Through Rate (CTR) models can't generalize well to online serving due to distribution uncertainty of the upcoming occasion. In this paper, we propose a novel CTR model named MOEF for recommendations under frequent changes of occasions. Firstly, we design a time series that consists of occasion signals generated from the online business scenario. Since occasion signals are more discriminative in the frequency domain, we apply Fourier Transformation to sliding time windows upon the time series, obtaining a sequence of frequency spectrum which is then processed by Occasion Evolution Layer (OEL). In this way, a high-order occasion representation can be learned to handle the online distribution uncertainty. Moreover, we adopt multiple experts to learn feature representations from multiple aspects, which are guided by the occasion representation via an attention mechanism. Accordingly, a mixture of feature representations is obtained adaptively for different occasions to predict the final CTR. Experimental results on real-world datasets validate the superiority of MOEF and online A/B tests also show MOEF outperforms representative CTR models significantly.


SAR-Net: A Scenario-Aware Ranking Network for Personalized Fair Recommendation in Hundreds of Travel Scenarios

arXiv.org Artificial Intelligence

The travel marketing platform of Alibaba serves an indispensable role for hundreds of different travel scenarios from Fliggy, Taobao, Alipay apps, etc. To provide personalized recommendation service for users visiting different scenarios, there are two critical issues to be carefully addressed. First, since the traffic characteristics of different scenarios, it is very challenging to train a unified model to serve all. Second, during the promotion period, the exposure of some specific items will be re-weighted due to manual intervention, resulting in biased logs, which will degrade the ranking model trained using these biased data. In this paper, we propose a novel Scenario-Aware Ranking Network (SAR-Net) to address these issues. SAR-Net harvests the abundant data from different scenarios by learning users' cross-scenario interests via two specific attention modules, which leverage the scenario features and item features to modulate the user behavior features, respectively. Then, taking the encoded features of previous module as input, a scenario-specific linear transformation layer is adopted to further extract scenario-specific features, followed by two groups of debias expert networks, i.e., scenario-specific experts and scenario-shared experts. They output intermediate results independently, which are further fused into the final result by a multi-scenario gating module. In addition, to mitigate the data fairness issue caused by manual intervention, we propose the concept of Fairness Coefficient (FC) to measures the importance of individual sample and use it to reweigh the prediction in the debias expert networks. Experiments on an offline dataset covering over 80 million users and 1.55 million travel items and an online A/B test demonstrate the effectiveness of our SAR-Net and its superiority over state-of-the-art methods.


Hierarchically Modeling Micro and Macro Behaviors via Multi-Task Learning for Conversion Rate Prediction

arXiv.org Artificial Intelligence

Conversion Rate (\emph{CVR}) prediction in modern industrial e-commerce platforms is becoming increasingly important, which directly contributes to the final revenue. In order to address the well-known sample selection bias (\emph{SSB}) and data sparsity (\emph{DS}) issues encountered during CVR modeling, the abundant labeled macro behaviors ($i.e.$, user's interactions with items) are used. Nonetheless, we observe that several purchase-related micro behaviors ($i.e.$, user's interactions with specific components on the item detail page) can supplement fine-grained cues for \emph{CVR} prediction. Motivated by this observation, we propose a novel \emph{CVR} prediction method by Hierarchically Modeling both Micro and Macro behaviors ($HM^3$). Specifically, we first construct a complete user sequential behavior graph to hierarchically represent micro behaviors and macro behaviors as one-hop and two-hop post-click nodes. Then, we embody $HM^3$ as a multi-head deep neural network, which predicts six probability variables corresponding to explicit sub-paths in the graph. They are further combined into the prediction targets of four auxiliary tasks as well as the final $CVR$ according to the conditional probability rule defined on the graph. By employing multi-task learning and leveraging the abundant supervisory labels from micro and macro behaviors, $HM^3$ can be trained end-to-end and address the \emph{SSB} and \emph{DS} issues. Extensive experiments on both offline and online settings demonstrate the superiority of the proposed $HM^3$ over representative state-of-the-art methods.


Conversion Rate Prediction via Post-Click Behaviour Modeling

arXiv.org Machine Learning

Effective and efficient recommendation is crucial for modern e-commerce platforms. It consists of two indispensable components named Click-Through Rate (CTR) prediction and Conversion Rate (CVR) prediction, where the latter is an essential factor contributing to the final purchasing volume. Existing methods specifically predict CVR using the clicked and purchased samples, which has limited performance affected by the well-known sample selection bias and data sparsity issues. To address these issues, we propose a novel deep CVR prediction method by considering the post-click behaviors. After grouping deterministic actions together, we construct a novel sequential path, which elaborately depicts the post-click behaviors of users. Based on the path, we define the CVR and several related probabilities including CTR, etc., and devise a deep neural network with multiple targets involved accordingly. It takes advantage of the abundant samples with deterministic labels derived from the post-click actions, leading to a significant improvement of CVR prediction. Extensive experiments on both offline and online settings demonstrate its superiority over representative state-of-the-art methods.


Multi-Level Deep Cascade Trees for Conversion Rate Prediction

arXiv.org Machine Learning

Developing effective and efficient recommendation methods is very challenging for modern e-commerce platforms (e.g. Taobao). Generally speaking, two essential modules named "Click-Through Rate Prediction" (CTR) and "Conversion Rate Prediction" (CVR) are included, where CVR module is a crucial factor that affects the final purchasing volume directly. However, it is indeed very challenging due to its sparseness nature. In this paper, we tackle this problem by proposing multi-Level Deep Cascade Trees (ldcTree), which is a novel decision tree ensemble approach. It leverages deep cascade structures by stacking Gradient Boosting Decision Trees (GBDT) to effectively learn feature representation. In addition, we propose to utilize the cross-entropy in each tree of the preceding GBDT as the input feature representation for next level GBDT, which has a clear explanation, i.e., a traversal from root to leaf nodes in the next level GBDT corresponds to the combination of certain traversals in the preceding GBDT. The deep cascade structure and the combination rule enable the proposed ldcTree to have a stronger distributed feature representation ability. Moreover, inspired by ensemble learning, we propose an Ensemble ldcTree (E-ldcTree) to encourage the model's diversity and enhance the representation ability further. Finally, we propose an improved Feature learning method based on EldcTree (F-EldcTree) for taking adequate use of weak and strong correlation features identified by pre-trained GBDT models. Experimental results on off-line dataset and online deployment demonstrate the effectiveness of the proposed methods.