Liu, Chenghao
DualNet: Continual Learning, Fast and Slow
Pham, Quang, Liu, Chenghao, Hoi, Steven
According to Complementary Learning Systems (CLS) theory~\citep{mcclelland1995there} in neuroscience, humans do effective \emph{continual learning} through two complementary systems: a fast learning system centered on the hippocampus for rapid learning of the specifics and individual experiences, and a slow learning system located in the neocortex for the gradual acquisition of structured knowledge about the environment. Motivated by this theory, we propose a novel continual learning framework named "DualNet", which comprises a fast learning system for supervised learning of pattern-separated representation from specific tasks and a slow learning system for unsupervised representation learning of task-agnostic general representation via a Self-Supervised Learning (SSL) technique. The two fast and slow learning systems are complementary and work seamlessly in a holistic continual learning framework. Our extensive experiments on two challenging continual learning benchmarks of CORE50 and miniImageNet show that DualNet outperforms state-of-the-art continual learning methods by a large margin. We further conduct ablation studies of different SSL objectives to validate DualNet's efficacy, robustness, and scalability. Code will be made available upon acceptance.
Modeling Dynamic Attributes for Next Basket Recommendation
Chen, Yongjun, Li, Jia, Liu, Chenghao, Li, Chenxi, Anderle, Markus, McAuley, Julian, Xiong, Caiming
Traditional approaches to next item and next basket recommendation typically extract users' interests based on their past interactions and associated static contextual information (e.g. a user id or item category). However, extracted interests can be inaccurate and become obsolete. Dynamic attributes, such as user income changes, item price changes (etc.), change over time. Such dynamics can intrinsically reflect the evolution of users' interests. We argue that modeling such dynamic attributes can boost recommendation performance. However, properly integrating them into user interest models is challenging since attribute dynamics can be diverse such as time-interval aware, periodic patterns (etc.), and they represent users' behaviors from different perspectives, which can happen asynchronously with interactions. Besides dynamic attributes, items in each basket contain complex interdependencies which might be beneficial but nontrivial to effectively capture. To address these challenges, we propose a novel Attentive network to model Dynamic attributes (named AnDa). AnDa separately encodes dynamic attributes and basket item sequences. We design a periodic aware encoder to allow the model to capture various temporal patterns from dynamic attributes. To effectively learn useful item relationships, intra-basket attention module is proposed. Experimental results on three real-world datasets demonstrate that our method consistently outperforms the state-of-the-art.
Merlion: A Machine Learning Library for Time Series
Bhatnagar, Aadyot, Kassianik, Paul, Liu, Chenghao, Lan, Tian, Yang, Wenzhuo, Cassius, Rowan, Sahoo, Doyen, Arpit, Devansh, Subramanian, Sri, Woo, Gerald, Saha, Amrita, Jagota, Arun Kumar, Gopalakrishnan, Gokulakrishnan, Singh, Manpreet, Krithika, K C, Maddineni, Sukumar, Cho, Daeki, Zong, Bo, Zhou, Yingbo, Xiong, Caiming, Savarese, Silvio, Hoi, Steven, Wang, Huan
We introduce Merlion, an open-source machine learning library for time series. It features a unified interface for many commonly used models and datasets for anomaly detection and forecasting on both univariate and multivariate time series, along with standard pre/post-processing layers. It has several modules to improve ease-of-use, including visualization, anomaly score calibration to improve interpetability, AutoML for hyperparameter tuning and model selection, and model ensembling. Merlion also provides a unique evaluation framework that simulates the live deployment and re-training of a model in production. This library aims to provide engineers and researchers a one-stop solution to rapidly develop models for their specific time series needs and benchmark them across multiple time series datasets. In this technical report, we highlight Merlion's architecture and major functionalities, and we report benchmark numbers across different baseline models and ensembles.
Learning Transferrable Parameters for Long-tailed Sequential User Behavior Modeling
Yin, Jianwen, Liu, Chenghao, Wang, Weiqing, Sun, Jianling, Hoi, Steven C. H.
Sequential user behavior modeling plays a crucial role in online user-oriented services, such as product purchasing, news feed consumption, and online advertising. The performance of sequential modeling heavily depends on the scale and quality of historical behaviors. However, the number of user behaviors inherently follows a long-tailed distribution, which has been seldom explored. In this work, we argue that focusing on tail users could bring more benefits and address the long tails issue by learning transferrable parameters from both optimization and feature perspectives. Specifically, we propose a gradient alignment optimizer and adopt an adversarial training scheme to facilitate knowledge transfer from the head to the tail. Such methods can also deal with the cold-start problem of new users. Moreover, it could be directly adaptive to various well-established sequential models. Extensive experiments on four real-world datasets verify the superiority of our framework compared with the state-of-the-art baselines.
Decentralized Knowledge Graph Representation Learning
Guo, Lingbing, Wang, Weiqing, Sun, Zequn, Liu, Chenghao, Hu, Wei
Knowledge graph (KG) representation learning methods have achieved competitive performance in many KG-oriented tasks, among which the best ones are usually based on graph neural networks (GNNs), a powerful family of networks that learns the representation of an entity by aggregating the features of its neighbors and itself. However, many KG representation learning scenarios only provide the structure information that describes the relationships among entities, causing that entities have no input features. In this case, existing aggregation mechanisms are incapable of inducing embeddings of unseen entities as these entities have no pre-defined features for aggregation. In this paper, we present a decentralized KG representation learning approach, decentRL, which encodes each entity from and only from the embeddings of its neighbors. For optimization, we design an algorithm to distill knowledge from the model itself such that the output embeddings can continuously gain knowledge from the corresponding original embeddings. Extensive experiments show that the proposed approach performed better than many cutting-edge models on the entity alignment task, and achieved competitive performance on the entity prediction task. Furthermore, under the inductive setting, it significantly outperformed all baselines on both tasks.
Graph Prototypical Networks for Few-shot Learning on Attributed Networks
Ding, Kaize, Wang, Jianling, Li, Jundong, Shu, Kai, Liu, Chenghao, Liu, Huan
Attributed networks nowadays are ubiquitous in a myriad of high-impact applications, such as social network analysis, financial fraud detection, and drug discovery. As a central analytical task on attributed networks, node classification has received much attention in the research community. In real-world attributed networks, a large portion of node classes only contain limited labeled instances, rendering a long-tail node class distribution. Existing node classification algorithms are unequipped to handle the \textit{few-shot} node classes. As a remedy, few-shot learning has attracted a surge of attention in the research community. Yet, few-shot node classification remains a challenging problem as we need to address the following questions: (i) How to extract meta-knowledge from an attributed network for few-shot node classification? (ii) How to identify the informativeness of each labeled instance for building a robust and effective model? To answer these questions, in this paper, we propose a graph meta-learning framework -- Graph Prototypical Networks (GPN). By constructing a pool of semi-supervised node classification tasks to mimic the real test environment, GPN is able to perform \textit{meta-learning} on an attributed network and derive a highly generalizable model for handling the target classification task. Extensive experiments demonstrate the superior capability of GPN in few-shot node classification.
Bilevel Continual Learning
Pham, Quang, Sahoo, Doyen, Liu, Chenghao, Hoi, Steven C. H
Continual learning aims to learn continuously from a stream of tasks and data in an online-learning fashion, being capable of exploiting what was learned previously to improve current and future tasks while still being able to perform well on the previous tasks. One common limitation of many existing continual learning methods is that they often train a model directly on all available training data without validation due to the nature of continual learning, thus suffering poor generalization at test time. In this work, we present a novel framework of continual learning named "Bilevel Continual Learning" (BCL) by unifying a bilevel optimization objective and a dual memory management strategy comprising both episodic memory and generalization memory to achieve effective knowledge transfer to future tasks and alleviate catastrophic forgetting on old tasks simultaneously. Our extensive experiments on continual learning benchmarks demonstrate the efficacy of the proposed BCL compared to many state-of-the-art methods. Unlike humans, conventional machine learning methods, particularly neural networks, struggle to learn continuously because these models lose their abilities to perform acquired skills when they learn a new task (French, 1999). Continual learning systems are specifically designed to learn continuously from a stream of tasks. They are able to accumulate knowledge over time to improve the future learning outcome, while still being able to perform well on the previous tasks.
Graph Neural Networks with High-order Feature Interactions
Ding, Kaize, Li, Yichuan, Li, Jundong, Liu, Chenghao, Liu, Huan
Network representation learning, a fundamental research problem which aims at learning low-dimension node representations on graph-structured data, has been extensively studied in the research community. By generalizing the power of neural networks on graph-structured data, graph neural networks (GNNs) achieve superior capability in network representation learning. However, the node features of many real-world graphs could be high-dimensional and sparse, rendering the learned node representations from existing GNN architectures less expressive. The main reason lies in that those models directly makes use of the raw features of nodes as input for the message-passing and have limited power in capturing sophisticated interactions between features. In this paper, we propose a novel GNN framework for learning node representations that incorporate high-order feature interactions on feature-sparse graphs. Specifically, the proposed message aggregator and feature factorizer extract two channels of embeddings from the feature-sparse graph, characterizing the aggregated node features and high-order feature interactions, respectively. Furthermore, we develop an attentive fusion network to seamlessly combine the information from two different channels and learn the feature interaction-aware node representations. Extensive experiments on various datasets demonstrate the effectiveness of the proposed framework on a variety of graph learning tasks.
SL$^2$MF: Predicting Synthetic Lethality in Human Cancers via Logistic Matrix Factorization
Liu, Yong, Wu, Min, Liu, Chenghao, Li, Xiao-Li, Zheng, Jie
Synthetic lethality (SL) is a promising concept for novel discovery of anti-cancer drug targets. However, wet-lab experiments for detecting SLs are faced with various challenges, such as high cost, low consistency across platforms or cell lines. Therefore, computational prediction methods are needed to address these issues. This paper proposes a novel SL prediction method, named SL2MF, which employs logistic matrix factorization to learn latent representations of genes from the observed SL data. The probability that two genes are likely to form SL is modeled by the linear combination of gene latent vectors. As known SL pairs are more trustworthy than unknown pairs, we design importance weighting schemes to assign higher importance weights for known SL pairs and lower importance weights for unknown pairs in SL2MF. Moreover, we also incorporate biological knowledge about genes from protein-protein interaction (PPI) data and Gene Ontology (GO). In particular, we calculate the similarity between genes based on their GO annotations and topological properties in the PPI network. Extensive experiments on the SL interaction data from SynLethDB database have been conducted to demonstrate the effectiveness of SL2MF.
Unified Locally Linear Classifiers With Diversity-Promoting Anchor Points
Liu, Chenghao (Zhejiang University, China) | Zhang, Teng (Singapore Management University, Singapore) | Zhao, Peilin (Zhejiang University) | Sun, Jianling (Alibaba-Zhejiang University Joint Institute of Frontier Technologies) | Hoi, Steven C. H. (South China University of Technology)
Locally Linear Support Vector Machine (LLSVM) has been actively used in classification tasks due to its capability of classifying nonlinear patterns. However, existing LLSVM suffers from two drawbacks: (1) a particular and appropriate regularization for LLSVM has not yet been addressed; (2) it usually adopts a three-stage learning scheme composed of learning anchor points by clustering, learning local coding coordinates by a predefined coding scheme, and finally learning for training classifiers. We argue that this decoupled approaches oversimplifies the original optimization problem, resulting in a large deviation due to the disparate purpose of each step. To address the first issue, we propose a novel diversified regularization which could capture infrequent patterns and reduce the model size without sacrificing the representation power. Based on this regularization, we develop a joint optimization algorithm among anchor points, local coding coordinates and classifiers to simultaneously minimize the overall classification risk, which is termed as Diversified and Unified Locally Linear Support Vector Machine (DU-LLSVM for short). To the best of our knowledge, DU-LLSVM is the first principled method that directly learns sparse local coding and can be easily generalized to other supervised learning models. Extensive experiments showed that DU-LLSVM consistently surpassed several state-of-the-art methods with a predefined local coding scheme (e.g. LLSVM) or a supervised anchor point learning (e.g. SAPL-LLSVM).