Goto

Collaborating Authors

 Peng, Chong


Large-Scale Multi-Domain Recommendation: an Automatic Domain Feature Extraction and Personalized Integration Framework

arXiv.org Artificial Intelligence

Feed recommendation is currently the mainstream mode for many real-world applications (e.g., TikTok, Dianping), it is usually necessary to model and predict user interests in multiple scenarios (domains) within and even outside the application. Multi-domain learning is a typical solution in this regard. While considerable efforts have been made in this regard, there are still two long-standing challenges: (1) Accurately depicting the differences among domains using domain features is crucial for enhancing the performance of each domain. However, manually designing domain features and models for numerous domains can be a laborious task. (2) Users typically have limited impressions in only a few domains. Extracting features automatically from other domains and leveraging them to improve the predictive capabilities of each domain has consistently posed a challenging problem. In this paper, we propose an Automatic Domain Feature Extraction and Personalized Integration (DFEI) framework for the large-scale multi-domain recommendation. The framework automatically transforms the behavior of each individual user into an aggregation of all user behaviors within the domain, which serves as the domain features. Unlike offline feature engineering methods, the extracted domain features are higher-order representations and directly related to the target label. Besides, by personalized integration of domain features from other domains for each user and the innovation in the training mode, the DFEI framework can yield more accurate conversion identification. Experimental results on both public and industrial datasets, consisting of over 20 domains, clearly demonstrate that the proposed framework achieves significantly better performance compared with SOTA baselines. Furthermore, we have released the source code of the proposed framework at https://github.com/xidongbo/DFEI.


Calibrating the Confidence of Large Language Models by Eliciting Fidelity

arXiv.org Artificial Intelligence

Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless. However, post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate. In this paper, we decompose the language model confidence into the \textit{Uncertainty} about the question and the \textit{Fidelity} to the answer generated by language models. Then, we propose a plug-and-play method to estimate the confidence of language models. Our method has shown good calibration performance by conducting experiments with 6 RLHF-LMs on four MCQA datasets. Moreover, we propose two novel metrics, IPR and CE, to evaluate the calibration of the model, and we have conducted a detailed discussion on \textit{Truly Well-Calibrated Confidence}. Our method could serve as a strong baseline, and we hope that this work will provide some insights into the model confidence calibration.


Explainable Censored Learning: Finding Critical Features with Long Term Prognostic Values for Survival Prediction

arXiv.org Artificial Intelligence

Interpreting critical variables involved in complex biological processes related to survival time can help understand prediction from survival models, evaluate treatment efficacy, and develop new therapies for patients. Currently, the predictive results of deep learning (DL)-based models are better than or as good as standard survival methods, they are often disregarded because of their lack of transparency and little interpretability, which is crucial to their adoption in clinical applications. In this paper, we introduce a novel, easily deployable approach, called EXplainable CEnsored Learning (EXCEL), to iteratively exploit critical variables and simultaneously implement (DL) model training based on these variables. First, on a toy dataset, we illustrate the principle of EXCEL; then, we mathematically analyze our proposed method, and we derive and prove tight generalization error bounds; next, on two semi-synthetic datasets, we show that EXCEL has good anti-noise ability and stability; finally, we apply EXCEL to a variety of real-world survival datasets including clinical data and genetic data, demonstrating that EXCEL can effectively identify critical features and achieve performance on par with or better than the original models. It is worth pointing out that EXCEL is flexibly deployed in existing or emerging models for explainable survival data in the presence of right censoring.


Hyperspectral Image Denoising Using Non-convex Local Low-rank and Sparse Separation with Spatial-Spectral Total Variation Regularization

arXiv.org Artificial Intelligence

In this paper, we propose a novel nonconvex approach to robust principal component analysis for HSI denoising, which focuses on simultaneously developing more accurate approximations to both rank and column-wise sparsity for the low-rank and sparse components, respectively. In particular, the new method adopts the log-determinant rank approximation and a novel $\ell_{2,\log}$ norm, to restrict the local low-rank or column-wisely sparse properties for the component matrices, respectively. For the $\ell_{2,\log}$-regularized shrinkage problem, we develop an efficient, closed-form solution, which is named $\ell_{2,\log}$-shrinkage operator. The new regularization and the corresponding operator can be generally used in other problems that require column-wise sparsity. Moreover, we impose the spatial-spectral total variation regularization in the log-based nonconvex RPCA model, which enhances the global piece-wise smoothness and spectral consistency from the spatial and spectral views in the recovered HSI. Extensive experiments on both simulated and real HSIs demonstrate the effectiveness of the proposed method in denoising HSIs.


Kernel Two-Dimensional Ridge Regression for Subspace Clustering

arXiv.org Artificial Intelligence

Subspace clustering methods have been widely studied recently. When the inputs are 2-dimensional (2D) data, existing subspace clustering methods usually convert them into vectors, which severely damages inherent structures and relationships from original data. In this paper, we propose a novel subspace clustering method for 2D data. It directly uses 2D data as inputs such that the learning of representations benefits from inherent structures and relationships of the data. It simultaneously seeks image projection and representation coefficients such that they mutually enhance each other and lead to powerful data representations. An efficient algorithm is developed to solve the proposed objective function with provable decreasing and convergence property. Extensive experimental results verify the effectiveness of the new method.


Structured Graph Learning for Clustering and Semi-supervised Classification

arXiv.org Artificial Intelligence

Graphs have become increasingly popular in modeling structures and interactions in a wide variety of problems during the last decade. Graph-based clustering and semi-supervised classification techniques have shown impressive performance. This paper proposes a graph learning framework to preserve both the local and global structure of data. Specifically, our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure. Furthermore, most existing graph-based methods conduct clustering and semi-supervised classification on the graph learned from the original data matrix, which doesn't have explicit cluster structure, thus they might not achieve the optimal performance. By considering rank constraint, the achieved graph will have exactly $c$ connected components if there are $c$ clusters or classes. As a byproduct of this, graph learning and label inference are jointly and iteratively implemented in a principled way. Theoretically, we show that our model is equivalent to a combination of kernel k-means and k-means methods under certain condition. Extensive experiments on clustering and semi-supervised classification demonstrate that the proposed method outperforms other state-of-the-art methods.


Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering

arXiv.org Machine Learning

In this paper, we propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF. It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step. In particular, projection matrices are sought under the guidance of building new data representations, such that the spatial information is retained and projections are enhanced by the goal of clustering, which helps construct optimal projection directions. Moreover, to exploit nonlinear structures of the data, manifold is constructed in the projected subspace, which is adaptively updated according to the projections and less afflicted with noise and outliers of the data and thus more representative in the projected space. Hence, seeking projections, building new data representations, and learning manifold are seamlessly integrated in a single model, which mutually enhance other and lead to a powerful data representation. Comprehensive experimental results verify the effectiveness of TS-NMF in comparison with several state-of-the-art algorithms, which suggests high potential of the proposed method for real world applications.


Nonnegative Matrix Factorization with Local Similarity Learning

arXiv.org Machine Learning

Existing nonnegative matrix factorization methods focus on learning global structure of the data to construct basis and coefficient matrices, which ignores the local structure that commonly exists among data. In this paper, we propose a new type of nonnegative matrix factorization method, which learns local similarity and clustering in a mutually enhancing way. The learned new representation is more representative in that it better reveals inherent geometric property of the data. Nonlinear expansion is given and efficient multiplicative updates are developed with theoretical convergence guarantees. Extensive experimental results have confirmed the effectiveness of the proposed model.


Discriminative Regression Machine: A Classifier for High-Dimensional Data or Imbalanced Data

arXiv.org Machine Learning

We introduce a discriminative regression approach to supervised classification in this paper. It estimates a representation model while accounting for discriminativeness between classes, thereby enabling accurate derivation of categorical information. This new type of regression models extends existing models such as ridge, lasso, and group lasso through explicitly incorporating discriminative information. As a special case we focus on a quadratic model that admits a closed-form analytical solution. The corresponding classifier is called discriminative regression machine (DRM). Three iterative algorithms are further established for the DRM to enhance the efficiency and scalability for real applications. Our approach and the algorithms are applicable to general types of data including images, high-dimensional data, and imbalanced data. We compare the DRM with currently state-of-the-art classifiers. Our extensive experimental results show superior performance of the DRM and confirm the effectiveness of the proposed approach.


RES-PCA: A Scalable Approach to Recovering Low-rank Matrices

arXiv.org Machine Learning

Robust principal component analysis (RPCA) has drawn significant attentions due to its powerful capability in recovering low-rank matrices as well as successful appplications in various real world problems. The current state-of-the-art algorithms usually need to solve singular value decomposition of large matrices, which generally has at least a quadratic or even cubic complexity. This drawback has limited the application of RPCA in solving real world problems. To combat this drawback, in this paper we propose a new type of RPCA method, RES-PCA, which is linearly efficient and scalable in both data size and dimension. For comparison purpose, AltProj, an existing scalable approach to RPCA requires the precise knowlwdge of the true rank; otherwise, it may fail to recover low-rank matrices. By contrast, our method works with or without knowing the true rank; even when both methods work, our method is faster. Extensive experiments have been performed and testified to the effectiveness of proposed method quantitatively and in visual quality, which suggests that our method is suitable to be employed as a light-weight, scalable component for RPCA in any application pipelines.