Yao, Tiansheng
Improving Multi-Task Generalization via Regularizing Spurious Correlation
Hu, Ziniu, Zhao, Zhe, Yi, Xinyang, Yao, Tiansheng, Hong, Lichan, Sun, Yizhou, Chi, Ed H.
Multi-Task Learning (MTL) is a powerful learning paradigm to improve generalization performance via knowledge sharing. However, existing studies find that MTL could sometimes hurt generalization, especially when two tasks are less correlated. One possible reason that hurts generalization is spurious correlation, i.e., some knowledge is spurious and not causally related to task labels, but the model could mistakenly utilize them and thus fail when such correlation changes. In MTL setup, there exist several unique challenges of spurious correlation. First, the risk of having non-causal knowledge is higher, as the shared MTL model needs to encode all knowledge from different tasks, and causal knowledge for one task could be potentially spurious to the other. Second, the confounder between task labels brings in a different type of spurious correlation to MTL. We theoretically prove that MTL is more prone to taking non-causal knowledge from other tasks than single-task learning, and thus generalize worse. To solve this problem, we propose Multi-Task Causal Representation Learning framework, aiming to represent multi-task knowledge via disentangled neural modules, and learn which module is causally related to each task via MTL-specific invariant regularization. Experiments show that it could enhance MTL model's performance by 5.5% on average over Multi-MNIST, MovieLens, Taskonomy, CityScape, and NYUv2, via alleviating spurious correlation problem.
Self-supervised Learning for Large-scale Item Recommendations
Yao, Tiansheng, Yi, Xinyang, Cheng, Derek Zhiyuan, Yu, Felix, Chen, Ting, Menon, Aditya, Hong, Lichan, Chi, Ed H., Tjoa, Steve, Kang, Jieqi, Ettinger, Evan
Large scale recommender models find most relevant items from huge catalogs, and they play a critical role in modern search and recommendation systems. To model the input space with large-vocab categorical features, a typical recommender model learns a joint embedding space through neural networks for both queries and items from user feedback data. However, with millions to billions of items, the power-law user feedback makes labels very sparse for a large amount of long-tail items. Inspired by the recent success in self-supervised representation learning research in both computer vision and natural language understanding, we propose a multi-task self-supervised learning (SSL) framework for large-scale item recommendations. The framework is designed to tackle the label sparsity problem by learning more robust item representations. Furthermore, we propose two self-supervised tasks applicable to models with categorical features within the proposed framework: (i) Feature Masking (FM) and (ii) Feature Dropout (FD). We evaluate our framework using two large-scale datasets with 500M and 1B training examples respectively. Our results demonstrate that the proposed framework outperforms traditional supervised learning only models and state-of-the-art regularization techniques in the context of item recommendations. The SSL framework shows larger improvement with less supervision compared to the counterparts. We also apply the proposed techniques to a web-scale commercial app-to-app recommendation system, and significantly improve top-tier business metrics via A/B experiments on live traffic. Our online results also verify our hypothesis that our framework indeed improves model performance on slices that lack supervision.
Efficient Subspace Segmentation via Quadratic Programming
Wang, Shusen (Zhejiang University) | Yuan, Xiaotong (National University of Singapore) | Yao, Tiansheng (Zhejiang University) | Yan, Shuicheng (National University of Singapore) | Shen, Jialie (Singapore Management University)
We explore in this paper efficient algorithmic solutions to robustsubspace segmentation. We propose the SSQP, namely SubspaceSegmentation via Quadratic Programming, to partition data drawnfrom multiple subspaces into multiple clusters. The basic idea ofSSQP is to express each datum as the linear combination of otherdata regularized by an overall term targeting zero reconstructioncoefficients over vectors from different subspaces. The derivedcoefficient matrix by solving a quadratic programming problem istaken as an affinity matrix, upon which spectral clustering isapplied to obtain the ultimate segmentation result. Similar tosparse subspace clustering (SCC) and low-rank representation (LRR),SSQP is robust to data noises as validated by experiments on toydata. Experiments on Hopkins 155 database show that SSQP can achievecompetitive accuracy as SCC and LRR in segmenting affine subspaces,while experimental results on the Extended Yale Face Database Bdemonstrate SSQP's superiority over SCC and LRR. Beyond segmentationaccuracy, all experiments show that SSQP is much faster than bothSSC and LRR in the practice of subspace segmentation.