Lian, Defu
Sparse Modeling-Based Sequential Ensemble Learning for Effective Outlier Detection in High-Dimensional Numeric Data
Pang, Guansong (University of Technology Sydney) | Cao, Longbing (University of Technology Sydney) | Chen, Ling (University of Technology Sydney) | Lian, Defu (University of Electronic Science and Technology of China) | Liu, Huan (Arizona State University)
The large proportion of irrelevant or noisy features in real-life high-dimensional data presents a significant challenge to subspace/feature selection-based high-dimensional outlier detection (a.k.a. outlier scoring) methods. These methods often perform the two dependent tasks: relevant feature subset search and outlier scoring independently, consequently retaining features/subspaces irrelevant to the scoring method and downgrading the detection performance. This paper introduces a novel sequential ensemble-based framework SEMSE and its instance CINFO to address this issue. SEMSE learns the sequential ensembles to mutually refine feature selection and outlier scoring by iterative sparse modeling with outlier scores as the pseudo target feature. CINFO instantiates SEMSE by using three successive recurrent components to build such sequential ensembles. Given outlier scores output by an existing outlier scoring method on a feature subset, CINFO first defines a Cantelli's inequality-based outlier thresholding function to select outlier candidates with a false positive upper bound. It then performs lasso-based sparse regression by treating the outlier scores as the target feature and the original features as predictors on the outlier candidate set to obtain a feature subset that is tailored for the outlier scoring method. Our experiments show that two different outlier scoring methods enabled by CINFO (i) perform significantly better on 11 real-life high-dimensional data sets, and (ii) have much better resilience to noisy features, compared to their bare versions and three state-of-the-art competitors. The source code of CINFO is available at https://sites.google.com/site/gspangsite/sourcecode.
Attention-Based Transactional Context Embedding for Next-Item Recommendation
Wang, Shoujin (University of Technology Sydney) | Hu, Liang (University of Technology Sydney) | Cao, Longbing (University of Technology Sydney) | Huang, Xiaoshui (University of Technology Sydney) | Lian, Defu ( University of Electronic Science and Technology of China ) | Liu, Wei (University of Technology Sydney)
To recommend the next item to a user in a transactional context is practical yet challenging in applications such as marketing campaigns. Transactional context refers to the items that are observable in a transaction. Most existing transaction based recommender systems (TBRSs) make recommendations by mainly considering recently occurring items instead of all the ones observed in the current context. Moreover, they often assume a rigid order between items within a transaction, which is not always practical. More importantly, a long transaction often contains many items irreverent to the next choice, which tends to overwhelm the influence of a few truly relevant ones. Therefore, we posit that a good TBRS should not only consider all the observed items in the current transaction but also weight them with different relevance to build an attentive context that outputs the proper next item with a high probability. To this end, we design an effective attention based transaction embedding model (ATEM) for context embedding to weight each observed item in a transaction without assuming order. The empirical study on real-world transaction datasets proves that ATEM significantly outperforms the state-of-the-art methods in terms of both accuracy and novelty.
Discrete Personalized Ranking for Fast Collaborative Filtering from Implicit Feedback
Zhang, Yan (University of Electronic Science and Technology of China) | Lian, Defu (University of Electronic Science and Technology of China) | Yang, Guowu (University of Electronic Science and Technology of China)
Personalized ranking is usually considered as an ultimate goal of recommendation systems, but it suffers from efficiency issues when making recommendations. To this end, we propose a learning-based hashing framework called Discrete Personalized Ranking (DPR), to map users and items to a Hamming space, where user-item affinity can be efficiently calculated via Hamming distance. Due to the existence of discrete constraints, it is possible to exploit a two-stage learning procedure for learning binary codes according to most existing methods. This two-stage procedure consists of relaxed optimization by discarding discrete constraints and subsequent binary quantization. However, such a procedure has been shown resulting in a large quantization loss, so that longer binary codes would be required. To this end, DPR directly tackles the discrete optimization problem of personalized ranking. And the balance and un-correlation constraints of binary codes are imposed to derive compact but informatics binary codes. Based on the evaluation on several datasets, the proposed framework shows consistent superiority to the competing baselines even though only using shorter binary code.