Lee, Kyumin
MEND: Meta dEmonstratioN Distillation for Efficient and Effective In-Context Learning
Li, Yichuan, Ma, Xiyao, Lu, Sixing, Lee, Kyumin, Liu, Xiaohu, Guo, Chenlei
Large Language models (LLMs) have demonstrated impressive in-context learning (ICL) capabilities, where a LLM makes predictions for a given test input together with a few input-output pairs (demonstrations). Nevertheless, the inclusion of demonstrations leads to a quadratic increase in the computational overhead of the self-attention mechanism. Existing solutions attempt to distill lengthy demonstrations into compact vectors. However, they often require task-specific retraining or compromise LLM's in-context learning performance. To mitigate these challenges, we present Meta dEmonstratioN Distillation (MEND), where a language model learns to distill any lengthy demonstrations into vectors without retraining for a new downstream task. We exploit the knowledge distillation to enhance alignment between MEND and LLM, achieving both efficiency and effectiveness simultaneously. MEND is endowed with the meta-knowledge of distilling demonstrations through a two-stage training process, which includes meta-distillation pretraining and fine-tuning. Comprehensive evaluations across seven diverse ICL task partitions using decoder-only (GPT-2) and encoder-decoder (T5) attest to MEND's prowess. It not only matches but often outperforms the Vanilla ICL as well as other state-of-the-art distillation models, while significantly reducing the computational demands. This innovation promises enhanced scalability and efficiency for the practical deployment of large language models
GRENADE: Graph-Centric Language Model for Self-Supervised Representation Learning on Text-Attributed Graphs
Li, Yichuan, Ding, Kaize, Lee, Kyumin
Self-supervised representation learning on text-attributed graphs, which aims to create expressive and generalizable representations for various downstream tasks, has received increasing research attention lately. However, existing methods either struggle to capture the full extent of structural context information or rely on task-specific training labels, which largely hampers their effectiveness and generalizability in practice. To solve the problem of self-supervised representation learning on text-attributed graphs, we develop a novel Graph-Centric Language model -- GRENADE. Specifically, GRENADE exploits the synergistic effect of both pre-trained language model and graph neural network by optimizing with two specialized self-supervised learning algorithms: graph-centric contrastive learning and graph-centric knowledge alignment. The proposed graph-centric self-supervised learning algorithms effectively help GRENADE to capture informative textual semantics as well as structural context information on text-attributed graphs. Through extensive experiments, GRENADE shows its superiority over state-of-the-art methods. Implementation is available at \url{https://github.com/bigheiniu/GRENADE}.
KEPLET: Knowledge-Enhanced Pretrained Language Model with Topic Entity Awareness
Li, Yichuan, Han, Jialong, Lee, Kyumin, Ma, Chengyuan, Yao, Benjamin, Liu, Derek
In recent years, Pre-trained Language Models (PLMs) have shown their superiority by pre-training on unstructured text corpus and then fine-tuning on downstream tasks. On entity-rich textual resources like Wikipedia, Knowledge-Enhanced PLMs (KEPLMs) incorporate the interactions between tokens and mentioned entities in pre-training, and are thus more effective on entity-centric tasks such as entity linking and relation classification. Although exploiting Wikipedia's rich structures to some extent, conventional KEPLMs still neglect a unique layout of the corpus where each Wikipedia page is around a topic entity (identified by the page URL and shown in the page title). In this paper, we demonstrate that KEPLMs without incorporating the topic entities will lead to insufficient entity interaction and biased (relation) word semantics. We thus propose KEPLET, a novel Knowledge-Enhanced Pre-trained LanguagE model with Topic entity awareness. In an end-to-end manner, KEPLET identifies where to add the topic entity's information in a Wikipedia sentence, fuses such information into token and mentioned entities representations, and supervises the network learning, through which it takes topic entities back into consideration. Experiments demonstrated the generality and superiority of KEPLET which was applied to two representative KEPLMs, achieving significant improvements on four entity-centric tasks.
Hierarchical Multi-head Attentive Network for Evidence-aware Fake News Detection
Vo, Nguyen, Lee, Kyumin
To detect fake news, researchers proposed to use The proliferation of biased news, misleading linguistics and textual content (Castillo et al., 2011; claims, disinformation and fake news has caused Zhao et al., 2015; Liu et al., 2015). Since textual heightened negative effects on modern society in claims are usually deliberately written to deceive various domains ranging from politics, economics readers, it is hard to detect fake news by solely to public health. A recent study showed that maliciously relying on the content claims. Therefore, multiple fabricated and partisan stories possibly works utilized other signals such as temporal caused citizens' misperception about political candidates spreading patterns (Liu and Wu, 2018), network (Allcott and Gentzkow, 2017) during the structures (Wu and Liu, 2018; Vo and Lee, 2018; 2016 U.S. presidential elections. In economics, the Shu et al., 2020) and users' feedbacks (Vo and spread of fake news has manipulated stock price Lee, 2019; Shu et al., 2019; Vo and Lee, 2020a).
HABERTOR: An Efficient and Effective Deep Hatespeech Detector
Tran, Thanh, Hu, Yifan, Hu, Changwei, Yen, Kevin, Tan, Fei, Lee, Kyumin, Park, Serim
We present our HABERTOR model for detecting hatespeech in large scale user-generated content. Inspired by the recent success of the BERT model, we propose several modifications to BERT to enhance the performance on the downstream hatespeech classification task. HABERTOR inherits BERT's architecture, but is different in four aspects: (i) it generates its own vocabularies and is pre-trained from the scratch using the largest scale hatespeech dataset; (ii) it consists of Quaternion-based factorized components, resulting in a much smaller number of parameters, faster training and inferencing, as well as less memory usage; (iii) it uses our proposed multi-source ensemble heads with a pooling layer for separate input sources, to further enhance its effectiveness; and (iv) it uses a regularized adversarial training with our proposed fine-grained and adaptive noise magnitude to enhance its robustness. Through experiments on the large-scale real-world hatespeech dataset with 1.4M annotated comments, we show that HABERTOR works better than 15 state-of-the-art hatespeech detection methods, including fine-tuning Language Models. In particular, comparing with BERT, our HABERTOR is 4~5 times faster in the training/inferencing phase, uses less than 1/3 of the memory, and has better performance, even though we pre-train it by using less than 1% of the number of words. Our generalizability analysis shows that HABERTOR transfers well to other unseen hatespeech datasets and is a more efficient and effective alternative to BERT for the hatespeech classification.
Where Are the Facts? Searching for Fact-checked Information to Alleviate the Spread of Fake News
Vo, Nguyen, Lee, Kyumin
Although many fact-checking systems have been developed in academia and industry, fake news is still proliferating on social media. These systems mostly focus on fact-checking but usually neglect online users who are the main drivers of the spread of misinformation. How can we use fact-checked information to improve users' consciousness of fake news to which they are exposed? How can we stop users from spreading fake news? To tackle these questions, we propose a novel framework to search for fact-checking articles, which address the content of an original tweet (that may contain misinformation) posted by online users. The search can directly warn fake news posters and online users (e.g. the posters' followers) about misinformation, discourage them from spreading fake news, and scale up verified content on social media. Our framework uses both text and images to search for fact-checking articles, and achieves promising results on real-world datasets. Our code and datasets are released at https://github.com/nguyenvo09/EMNLP2020.
Quaternion-Based Self-Attentive Long Short-Term User Preference Encoding for Recommendation
Tran, Thanh, You, Di, Lee, Kyumin
Quaternion space has brought several benefits over the traditional Euclidean space: Quaternions (i) consist of a real and three imaginary components, encouraging richer representations; (ii) utilize Hamilton product which better encodes the inter-latent interactions across multiple Quaternion components; and (iii) result in a model with smaller degrees of freedom and less prone to overfitting. Unfortunately, most of the current recommender systems rely on real-valued representations in Euclidean space to model either user's long-term or short-term interests. In this paper, we fully utilize Quaternion space to model both user's long-term and short-term preferences. We first propose a QUaternion-based self-Attentive Long term user Encoding (QUALE) to study the user's long-term intents. Then, we propose a QUaternion-based self-Attentive Short term user Encoding (QUASE) to learn the user's short-term interests. To enhance our models' capability, we propose to fuse QUALE and QUASE into one model, namely QUALSE, by using a Quaternion-based gating mechanism. We further develop Quaternion-based Adversarial learning along with the Bayesian Personalized Ranking (QABPR) to improve our model's robustness. Extensive experiments on six real-world datasets show that our fused QUALSE model outperformed 11 state-of-the-art baselines, improving 8.43% at HIT@1 and 10.27% at NDCG@1 on average compared with the best baseline.
Signed Distance-based Deep Memory Recommender
Tran, Thanh, Liu, Xinyue, Lee, Kyumin, Kong, Xiangnan
Personalized recommendation algorithms learn a user's preference for an item by measuring a distance/similarity between them. However, some of the existing recommendation models (e.g., matrix factorization) assume a linear relationship between the user and item. This approach limits the capacity of recommender systems, since the interactions between users and items in real-world applications are much more complex than the linear relationship. To overcome this limitation, in this paper, we design and propose a deep learning framework called Signed Distance-based Deep Memory Recommender, which captures non-linear relationships between users and items explicitly and implicitly, and work well in both general recommendation task and shopping basket-based recommendation task. Through an extensive empirical study on six real-world datasets in the two recommendation tasks, our proposed approach achieved significant improvement over ten state-of-the-art recommendation models.
Regularizing Matrix Factorization with User and Item Embeddings for Recommendation
Tran, Thanh, Lee, Kyumin, Liao, Yiming, Lee, Dongwon
Following recent successes in exploiting both latent factor and word embedding models in recommendation, we propose a novel Regularized Multi-Embedding (RME) based recommendation model that simultaneously encapsulates the following ideas via decomposition: (1) which items a user likes, (2) which two users co-like the same items, (3) which two items users often co-liked, and (4) which two items users often co-disliked. In experimental validation, the RME outperforms competing state-of-the-art models in both explicit and implicit feedback datasets, significantly improving Recall@5 by 5.9~7.0%, NDCG@20 by 4.3~5.6%, and MAP@10 by 7.9~8.9%. In addition, under the cold-start scenario for users with the lowest number of interactions, against the competing models, the RME outperforms NDCG@5 by 20.2% and 29.4% in MovieLens-10M and MovieLens-20M datasets, respectively. Our datasets and source code are available at: https://github.com/thanhdtran/RME.git.
Crowds, Gigs, and Super Sellers: A Measurement Study of a Supply-Driven Crowdsourcing Marketplace
Ge, Hancheng (Texas A&M University) | Caverlee, James (Texas A&M University) | Lee, Kyumin (Utah State University)
The crowdsourcing movement has spawned a host of successful efforts that organize large numbers of globally-distributed participants to tackle a range of tasks. While many demand-driven crowd marketplaces have emerged (like Amazon Mechanical Turk, often resulting in workers that are essentially replace-able), we are witnessing the rise of supply-driven marketplaces where specialized workers offer their expertise. In this paper, we present a comprehensive data-driven measurement study of one prominent supply-driven marketplace -- Fiverr -- wherein we investigate the sellers and their offerings (called "gigs"). As part of this investigation, we identify the key features distinguishing "super sellers" from regular participants and develop a machine learning based approach for inferring the quality of gigs, which is especially important for the vast majority of gigs with little feedback.