Collaborating Authors


Towards a Theoretical Understanding of Word and Relation Representation Machine Learning

Representing words by vectors, or embeddings, enables computational reasoning and is foundational to automating natural language tasks. For example, if word embeddings of similar words contain similar values, word similarity can be readily assessed, whereas judging that from their spelling is often impossible (e.g. cat /feline) and to predetermine and store similarities between all words is prohibitively time-consuming, memory intensive and subjective. We focus on word embeddings learned from text corpora and knowledge graphs. Several well-known algorithms learn word embeddings from text on an unsupervised basis by learning to predict those words that occur around each word, e.g. word2vec and GloVe. Parameters of such word embeddings are known to reflect word co-occurrence statistics, but how they capture semantic meaning has been unclear. Knowledge graph representation models learn representations both of entities (words, people, places, etc.) and relations between them, typically by training a model to predict known facts in a supervised manner. Despite steady improvements in fact prediction accuracy, little is understood of the latent structure that enables this. The limited understanding of how latent semantic structure is encoded in the geometry of word embeddings and knowledge graph representations makes a principled means of improving their performance, reliability or interpretability unclear. To address this: 1. we theoretically justify the empirical observation that particular geometric relationships between word embeddings learned by algorithms such as word2vec and GloVe correspond to semantic relations between words; and 2. we extend this correspondence between semantics and geometry to the entities and relations of knowledge graphs, providing a model for the latent structure of knowledge graph representation linked to that of word embeddings.

Adaptive Memory Networks with Self-supervised Learning for Unsupervised Anomaly Detection Artificial Intelligence

Unsupervised anomaly detection aims to build models to effectively detect unseen anomalies by only training on the normal data. Although previous reconstruction-based methods have made fruitful progress, their generalization ability is limited due to two critical challenges. First, the training dataset only contains normal patterns, which limits the model generalization ability. Second, the feature representations learned by existing models often lack representativeness which hampers the ability to preserve the diversity of normal patterns. In this paper, we propose a novel approach called Adaptive Memory Network with Self-supervised Learning (AMSL) to address these challenges and enhance the generalization ability in unsupervised anomaly detection. Based on the convolutional autoencoder structure, AMSL incorporates a self-supervised learning module to learn general normal patterns and an adaptive memory fusion module to learn rich feature representations. Experiments on four public multivariate time series datasets demonstrate that AMSL significantly improves the performance compared to other state-of-the-art methods. Specifically, on the largest CAP sleep stage detection dataset with 900 million samples, AMSL outperforms the second-best baseline by \textbf{4}\%+ in both accuracy and F1 score. Apart from the enhanced generalization ability, AMSL is also more robust against input noise.

Towards Graph Self-Supervised Learning with Contrastive Adjusted Zooming Artificial Intelligence

Graph representation learning (GRL) is critical for graph-structured data analysis. However, most of the existing graph neural networks (GNNs) heavily rely on labeling information, which is normally expensive to obtain in the real world. Existing unsupervised GRL methods suffer from certain limitations, such as the heavy reliance on monotone contrastiveness and limited scalability. To overcome the aforementioned problems, in light of the recent advancements in graph contrastive learning, we introduce a novel self-supervised graph representation learning algorithm via Graph Contrastive Adjusted Zooming, namely G-Zoom, to learn node representations by leveraging the proposed adjusted zooming scheme. Specifically, this mechanism enables G-Zoom to explore and extract self-supervision signals from a graph from multiple scales: micro (i.e., node-level), meso (i.e., neighbourhood-level), and macro (i.e., subgraph-level). Firstly, we generate two augmented views of the input graph via two different graph augmentations. Then, we establish three different contrastiveness on the above three scales progressively, from node, neighbouring, to subgraph level, where we maximize the agreement between graph representations across scales. While we can extract valuable clues from a given graph on the micro and macro perspectives, the neighbourhood-level contrastiveness offers G-Zoom the capability of a customizable option based on our adjusted zooming scheme to manually choose an optimal viewpoint that lies between the micro and macro perspectives to better understand the graph data. Additionally, to make our model scalable to large graphs, we employ a parallel graph diffusion approach to decouple model training from the graph size. We have conducted extensive experiments on real-world datasets, and the results demonstrate that our proposed model outperforms state-of-the-art methods consistently.

Lexico-semantic and affective modelling of Spanish poetry: A semi-supervised learning approach Artificial Intelligence

Text classification tasks have improved substantially during the last years by the usage of transformers. However, the majority of researches focus on prose texts, with poetry receiving less attention, specially for Spanish language. In this paper, we propose a semi-supervised learning approach for inferring 21 psychological categories evoked by a corpus of 4572 sonnets, along with 10 affective and lexico-semantic multiclass ones. The subset of poems used for training an evaluation includes 270 sonnets. With our approach, we achieve an AUC beyond 0.7 for 76% of the psychological categories, and an AUC over 0.65 for 60% on the multiclass ones. The sonnets are modelled using transformers, through sentence embeddings, along with lexico-semantic and affective features, obtained by using external lexicons. Consequently, we see that this approach provides an AUC increase of up to 0.12, as opposed to using transformers alone.

A New Interpolation Approach and Corresponding Instance-Based Learning Artificial Intelligence

Starting from finding approximate value of a function, introduces the measure of approximation-degree between two numerical values, proposes the concepts of "strict approximation" and "strict approximation region", then, derives the corresponding one-dimensional interpolation methods and formulas, and then presents a calculation model called "sum-times-difference formula" for high-dimensional interpolation, thus develops a new interpolation approach, that is, ADB interpolation. ADB interpolation is applied to the interpolation of actual functions with satisfactory results. Viewed from principle and effect, the interpolation approach is of novel idea, and has the advantages of simple calculation, stable accuracy, facilitating parallel processing, very suiting for high-dimensional interpolation, and easy to be extended to the interpolation of vector valued functions. Applying the approach to instance-based learning, a new instance-based learning method, learning using ADB interpolation, is obtained. The learning method is of unique technique, which has also the advantages of definite mathematical basis, implicit distance weights, avoiding misclassification, high efficiency, and wide range of applications, as well as being interpretable, etc. In principle, this method is a kind of learning by analogy, which and the deep learning that belongs to inductive learning can complement each other, and for some problems, the two can even have an effect of "different approaches but equal results" in big data and cloud computing environment. Thus, the learning using ADB interpolation can also be regarded as a kind of "wide learning" that is dual to deep learning.

Exploring Semi-Supervised Learning for Predicting Listener Backchannels Artificial Intelligence

Developing human-like conversational agents is a prime area in HCI research and subsumes many tasks. Predicting listener backchannels is one such actively-researched task. While many studies have used different approaches for backchannel prediction, they all have depended on manual annotations for a large dataset. This is a bottleneck impacting the scalability of development. To this end, we propose using semi-supervised techniques to automate the process of identifying backchannels, thereby easing the annotation process. To analyze our identification module's feasibility, we compared the backchannel prediction models trained on (a) manually-annotated and (b) semi-supervised labels. Quantitative analysis revealed that the proposed semi-supervised approach could attain 95% of the former's performance. Our user-study findings revealed that almost 60% of the participants found the backchannel responses predicted by the proposed model more natural. Finally, we also analyzed the impact of personality on the type of backchannel signals and validated our findings in the user-study.

Self-supervised Learning for Large-scale Item Recommendations Machine Learning

Large scale recommender models find most relevant items from huge catalogs, and they play a critical role in modern search and recommendation systems. To model the input space with large-vocab categorical features, a typical recommender model learns a joint embedding space through neural networks for both queries and items from user feedback data. However, with millions to billions of items, the power-law user feedback makes labels very sparse for a large amount of long-tail items. Inspired by the recent success in self-supervised representation learning research in both computer vision and natural language understanding, we propose a multi-task self-supervised learning (SSL) framework for large-scale item recommendations. The framework is designed to tackle the label sparsity problem by learning more robust item representations. Furthermore, we propose two self-supervised tasks applicable to models with categorical features within the proposed framework: (i) Feature Masking (FM) and (ii) Feature Dropout (FD). We evaluate our framework using two large-scale datasets with 500M and 1B training examples respectively. Our results demonstrate that the proposed framework outperforms traditional supervised learning only models and state-of-the-art regularization techniques in the context of item recommendations. The SSL framework shows larger improvement with less supervision compared to the counterparts. We also apply the proposed techniques to a web-scale commercial app-to-app recommendation system, and significantly improve top-tier business metrics via A/B experiments on live traffic. Our online results also verify our hypothesis that our framework indeed improves model performance on slices that lack supervision.

Auxiliary Task Reweighting for Minimum-data Learning Machine Learning

Supervised learning requires a large amount of training data, limiting its application where labeled data is scarce. To compensate for data scarcity, one possible method is to utilize auxiliary tasks to provide additional supervision for the main task. Assigning and optimizing the importance weights for different auxiliary tasks remains an crucial and largely understudied research question. In this work, we propose a method to automatically reweight auxiliary tasks in order to reduce the data requirement on the main task. Specifically, we formulate the weighted likelihood function of auxiliary tasks as a surrogate prior for the main task. By adjusting the auxiliary task weights to minimize the divergence between the surrogate prior and the true prior of the main task, we obtain a more accurate prior estimation, achieving the goal of minimizing the required amount of training data for the main task and avoiding a costly grid search. In multiple experimental settings (e.g. semi-supervised learning, multi-label classification), we demonstrate that our algorithm can effectively utilize limited labeled data of the main task with the benefit of auxiliary tasks compared with previous task reweighting methods. We also show that under extreme cases with only a few extra examples (e.g. few-shot domain adaptation), our algorithm results in significant improvement over the baseline.

Chapter 1: Introduction to Machine Learning and Deep Learning


Throughout this book, we will use the term machine learning to refer to both traditional machine learning and deep learning. Supervised learning is focused on predictive modeling tasks, that is, modeling the relationship between features extracted from the data and one or multiple target variables or labels. While supervised learning is based on labeled data, unsupervised learning aims to model the hidden structure in data without label information. Finally, reinforcement learning is concerned with developing reward systems to model complex decision processes and learning series of actions. Supervised learning is concerned with predicting a target value given input observations.

Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax Machine Learning

Solving long-tail large vocabulary object detection with deep learning based models is a challenging and demanding task, which is however under-explored.In this work, we provide the first systematic analysis on the underperformance of state-of-the-art models in front of long-tail distribution. We find existing detection methods are unable to model few-shot classes when the dataset is extremely skewed, which can result in classifier imbalance in terms of parameter magnitude. Directly adapting long-tail classification models to detection frameworks can not solve this problem due to the intrinsic difference between detection and classification.In this work, we propose a novel balanced group softmax (BAGS) module for balancing the classifiers within the detection frameworks through group-wise training. It implicitly modulates the training process for the head and tail classes and ensures they are both sufficiently trained, without requiring any extra sampling for the instances from the tail classes.Extensive experiments on the very recent long-tail large vocabulary object recognition benchmark LVIS show that our proposed BAGS significantly improves the performance of detectors with various backbones and frameworks on both object detection and instance segmentation. It beats all state-of-the-art methods transferred from long-tail image classification and establishes new state-of-the-art.Code is available at