tscan
TSCAN: Context-Aware Uplift Modeling via Two-Stage Training for Online Merchant Business Diagnosis
Zhang, Hangtao, Li, Zhe, Zhang, Kairui
A primary challenge in ITE estimation is sample selection bias. Traditional approaches utilize treatment regularization techniques such as the Integral Probability Metrics (IPM), re-weighting, and propensity score modeling to mitigate this bias. However, these regularizations may introduce undesirable information loss and limit the performance of the model. Furthermore, treatment effects vary across different external contexts, and the existing methods are insufficient in fully interacting with and utilizing these contextual features. To address these issues, we propose a Context-Aware uplift model based on the Two-Stage training approach (TSCAN), comprising CAN-U and CAN-D sub-models. In the first stage, we train an uplift model, called CAN-U, which includes the treatment regularizations of IPM and propensity score prediction, to generate a complete dataset with counterfactual uplift labels. In the second stage, we train a model named CAN-D, which utilizes an isotonic output layer to directly model uplift effects, thereby eliminating the reliance on the regularization components. CAN-D adaptively corrects the errors estimated by CAN-U through reinforcing the factual samples, while avoiding the negative impacts associated with the aforementioned regularizations. Additionally, we introduce a Context-Aware Attention Layer throughout the two-stage process to manage the interactions between treatment, merchant, and contextual features, thereby modeling the varying treatment effect in different contexts. We conduct extensive experiments on two real-world datasets to validate the effectiveness of TSCAN. Ultimately, the deployment of our model for real-world merchant diagnosis on one of China's largest online food ordering platforms validates its practical utility and impact.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (6 more...)
TSCAN : Dialog Structure discovery using SCAN
Can we discover dialog structure by dividing utterances into labelled clusters. Can these labels be generated from the data. Typically for dialogs we need an ontology and use that to discover structure, however by using unsupervised classification and self-labelling we are able to intuit this structure without any labels or ontology. In this paper we apply SCAN (Semantic Clustering using Nearest Neighbors) to dialog data. We used BERT for pretext task and an adaptation of SCAN for clustering and self labeling. These clusters are used to identify transition probabilities and create the dialog structure. The self-labelling method used for SCAN makes these structures interpretable as every cluster has a label. As the approach is unsupervised, evaluation metrics is a challenge, we use statistical measures as proxies for structure quality