Goto

Collaborating Authors

 He, Xiangnan


Towards Fine-Grained Information: Identifying the Type and Location of Translation Errors

arXiv.org Artificial Intelligence

Fine-grained information on translation errors is helpful for the translation evaluation community. Existing approaches can not synchronously consider error position and type, failing to integrate the error information of both. In this paper, we propose Fine-Grained Translation Error Detection (FG-TED) task, aiming at identifying both the position and the type of translation errors on given source-hypothesis sentence pairs. Besides, we build an FG-TED model to predict the \textbf{addition} and \textbf{omission} errors -- two typical translation accuracy errors. First, we use a word-level classification paradigm to form our model and use the shortcut learning reduction to relieve the influence of monolingual features. Besides, we construct synthetic datasets for model training, and relieve the disagreement of data labeling in authoritative datasets, making the experimental benchmark concordant. Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results on the restored dataset. Our model also delivers more reliable predictions on low-resource and transfer scenarios than existing baselines. The related datasets and the source code will be released in the future.


FFHR: Fully and Flexible Hyperbolic Representation for Knowledge Graph Completion

arXiv.org Artificial Intelligence

Learning hyperbolic embeddings for knowledge graph (KG) has gained increasing attention due to its superiority in capturing hierarchies. However, some important operations in hyperbolic space still lack good definitions, making existing methods unable to fully leverage the merits of hyperbolic space. Specifically, they suffer from two main limitations: 1) existing Graph Convolutional Network (GCN) methods in hyperbolic space rely on tangent space approximation, which would incur approximation error in representation learning, and 2) due to the lack of inner product operation definition in hyperbolic space, existing methods can only measure the plausibility of facts (links) with hyperbolic distance, which is difficult to capture complex data patterns. In this work, we contribute: 1) a Full Poincar\'{e} Multi-relational GCN that achieves graph information propagation in hyperbolic space without requiring any approximation, and 2) a hyperbolic generalization of Euclidean inner product that is beneficial to capture both hierarchical and complex patterns. On this basis, we further develop a \textbf{F}ully and \textbf{F}lexible \textbf{H}yperbolic \textbf{R}epresentation framework (\textbf{FFHR}) that is able to transfer recent Euclidean-based advances to hyperbolic space. We demonstrate it by instantiating FFHR with four representative KGC methods. Extensive experiments on benchmark datasets validate the superiority of our FFHRs over their Euclidean counterparts as well as state-of-the-art hyperbolic embedding methods.


Unbiased Knowledge Distillation for Recommendation

arXiv.org Artificial Intelligence

As a promising solution for model compression, knowledge distillation (KD) has been applied in recommender systems (RS) to reduce inference latency. Traditional solutions first train a full teacher model from the training data, and then transfer its knowledge (\ie \textit{soft labels}) to supervise the learning of a compact student model. However, we find such a standard distillation paradigm would incur serious bias issue -- popular items are more heavily recommended after the distillation. This effect prevents the student model from making accurate and fair recommendations, decreasing the effectiveness of RS. In this work, we identify the origin of the bias in KD -- it roots in the biased soft labels from the teacher, and is further propagated and intensified during the distillation. To rectify this, we propose a new KD method with a stratified distillation strategy. It first partitions items into multiple groups according to their popularity, and then extracts the ranking knowledge within each group to supervise the learning of the student. Our method is simple and teacher-agnostic -- it works on distillation stage without affecting the training of the teacher model. We conduct extensive theoretical and empirical studies to validate the effectiveness of our proposal. We release our code at: https://github.com/chengang95/UnKD.


Explainable Sparse Knowledge Graph Completion via High-order Graph Reasoning Network

arXiv.org Artificial Intelligence

Knowledge Graphs (KGs) are becoming increasingly essential infrastructures in many applications while suffering from incompleteness issues. The KG completion task (KGC) automatically predicts missing facts based on an incomplete KG. However, existing methods perform unsatisfactorily in real-world scenarios. On the one hand, their performance will dramatically degrade along with the increasing sparsity of KGs. On the other hand, the inference procedure for prediction is an untrustworthy black box. This paper proposes a novel explainable model for sparse KGC, compositing high-order reasoning into a graph convolutional network, namely HoGRN. It can not only improve the generalization ability to mitigate the information insufficiency issue but also provide interpretability while maintaining the model's effectiveness and efficiency. There are two main components that are seamlessly integrated for joint optimization. First, the high-order reasoning component learns high-quality relation representations by capturing endogenous correlation among relations. This can reflect logical rules to justify a broader of missing facts. Second, the entity updating component leverages a weight-free Graph Convolutional Network (GCN) to efficiently model KG structures with interpretability. Unlike conventional methods, we conduct entity aggregation and design composition-based attention in the relational space without additional parameters. The lightweight design makes HoGRN better suitable for sparse settings. For evaluation, we have conducted extensive experiments-the results of HoGRN on several sparse KGs present impressive improvements (9% MRR gain on average). Further ablation and case studies demonstrate the effectiveness of the main components. Our codes will be released upon acceptance.


Deconfounded Training for Graph Neural Networks

arXiv.org Artificial Intelligence

Learning powerful representations is one central theme of graph neural networks (GNNs). It requires refining the critical information from the input graph, instead of the trivial patterns, to enrich the representations. Towards this end, graph attention and pooling methods prevail. They mostly follow the paradigm of "learning to attend". It maximizes the mutual information between the attended subgraph and the ground-truth label. However, this training paradigm is prone to capture the spurious correlations between the trivial subgraph and the label. Such spurious correlations are beneficial to in-distribution (ID) test evaluations, but cause poor generalization in the out-of-distribution (OOD) test data. In this work, we revisit the GNN modeling from the causal perspective. On the top of our causal assumption, the trivial information serves as a confounder between the critical information and the label, which opens a backdoor path between them and makes them spuriously correlated. Hence, we present a new paradigm of deconfounded training (DTP) that better mitigates the confounding effect and latches on the critical information, to enhance the representation and generalization ability. Specifically, we adopt the attention modules to disentangle the critical subgraph and trivial subgraph. Then we make each critical subgraph fairly interact with diverse trivial subgraphs to achieve a stable prediction. It allows GNNs to capture a more reliable subgraph whose relation with the label is robust across different distributions. We conduct extensive experiments on synthetic and real-world datasets to demonstrate the effectiveness.


Learning Robust Recommender from Noisy Implicit Feedback

arXiv.org Artificial Intelligence

The ubiquity of implicit feedback makes it indispensable for building recommender systems. However, it does not actually reflect the actual satisfaction of users. For example, in E-commerce, a large portion of clicks do not translate to purchases, and many purchases end up with negative reviews. As such, it is of importance to account for the inevitable noises in implicit feedback. However, little work on recommendation has taken the noisy nature of implicit feedback into consideration. In this work, we explore the central theme of denoising implicit feedback for recommender learning, including training and inference. By observing the process of normal recommender training, we find that noisy feedback typically has large loss values in the early stages. Inspired by this observation, we propose a new training strategy named Adaptive Denoising Training (ADT), which adaptively prunes the noisy interactions by two paradigms (i.e., Truncated Loss and Reweighted Loss). Furthermore, we consider extra feedback (e.g., rating) as auxiliary signal and propose three strategies to incorporate extra feedback into ADT: finetuning, warm-up training, and colliding inference. We instantiate the two paradigms on the widely used binary cross-entropy loss and test them on three representative recommender models. Extensive experiments on three benchmarks demonstrate that ADT significantly improves the quality of recommendation over normal training without using extra feedback. Besides, the proposed three strategies for using extra feedback largely enhance the denoising ability of ADT.


DisenKGAT: Knowledge Graph Embedding with Disentangled Graph Attention Network

arXiv.org Artificial Intelligence

Knowledge graph completion (KGC) has become a focus of attention across deep learning community owing to its excellent contribution to numerous downstream tasks. Although recently have witnessed a surge of work on KGC, they are still insufficient to accurately capture complex relations, since they adopt the single and static representations. In this work, we propose a novel Disentangled Knowledge Graph Attention Network (DisenKGAT) for KGC, which leverages both micro-disentanglement and macro-disentanglement to exploit representations behind Knowledge graphs (KGs). To achieve micro-disentanglement, we put forward a novel relation-aware aggregation to learn diverse component representation. For macro-disentanglement, we leverage mutual information as a regularization to enhance independence. With the assistance of disentanglement, our model is able to generate adaptive representations in terms of the given scenario. Besides, our work has strong robustness and flexibility to adapt to various score functions. Extensive experiments on public benchmark datasets have been conducted to validate the superiority of DisenKGAT over existing methods in terms of both accuracy and explainability.


Causal Incremental Graph Convolution for Recommender System Retraining

arXiv.org Artificial Intelligence

Real-world recommender system needs to be regularly retrained to keep with the new data. In this work, we consider how to efficiently retrain graph convolution network (GCN) based recommender models, which are state-of-the-art techniques for collaborative recommendation. To pursue high efficiency, we set the target as using only new data for model updating, meanwhile not sacrificing the recommendation accuracy compared with full model retraining. This is non-trivial to achieve, since the interaction data participates in both the graph structure for model construction and the loss function for model learning, whereas the old graph structure is not allowed to use in model updating. Towards the goal, we propose a \textit{Causal Incremental Graph Convolution} approach, which consists of two new operators named \textit{Incremental Graph Convolution} (IGC) and \textit{Colliding Effect Distillation} (CED) to estimate the output of full graph convolution. In particular, we devise simple and effective modules for IGC to ingeniously combine the old representations and the incremental graph and effectively fuse the long-term and short-term preference signals. CED aims to avoid the out-of-date issue of inactive nodes that are not in the incremental graph, which connects the new data with inactive nodes through causal inference. In particular, CED estimates the causal effect of new data on the representation of inactive nodes through the control of their collider. Extensive experiments on three real-world datasets demonstrate both accuracy gains and significant speed-ups over the existing retraining mechanism.


Empowering Language Understanding with Counterfactual Reasoning

arXiv.org Artificial Intelligence

Present language understanding methods have demonstrated extraordinary ability of recognizing patterns in texts via machine learning. However, existing methods indiscriminately use the recognized patterns in the testing phase that is inherently different from us humans who have counterfactual thinking, e.g., to scrutinize for the hard testing samples. Inspired by this, we propose a Counterfactual Reasoning Model, which mimics the counterfactual thinking by learning from few counterfactual samples. In particular, we devise a generation module to generate representative counterfactual samples for each factual sample, and a retrospective module to retrospect the model prediction by comparing the counterfactual and factual samples. Extensive experiments on sentiment analysis (SA) and natural language inference (NLI) validate the effectiveness of our method.


Probabilistic and Variational Recommendation Denoising

arXiv.org Artificial Intelligence

Learning from implicit feedback is one of the most common cases in the application of recommender systems. Generally speaking, interacted examples are considered as positive while negative examples are sampled from uninteracted ones. However, noisy examples are prevalent in real-world implicit feedback. A noisy positive example could be interacted but it actually leads to negative user preference. A noisy negative example which is uninteracted because of unawareness of the user could also denote potential positive user preference. Conventional training methods overlook these noisy examples, leading to sub-optimal recommendation. In this work, we propose probabilistic and variational recommendation denoising for implicit feedback. Through an empirical study, we find that different models make relatively similar predictions on clean examples which denote the real user preference, while the predictions on noisy examples vary much more across different models. Motivated by this observation, we propose denoising with probabilistic inference (DPI) which aims to minimize the KL-divergence between the real user preference distributions parameterized by two recommendation models while maximize the likelihood of data observation. We then show that DPI recovers the evidence lower bound of an variational auto-encoder when the real user preference is considered as the latent variables. This leads to our second learning framework denoising with variational autoencoder (DVAE). We employ the proposed DPI and DVAE on four state-of-the-art recommendation models and conduct experiments on three datasets. Experimental results demonstrate that DPI and DVAE significantly improve recommendation performance compared with normal training and other denoising methods. Codes will be open-sourced.