Goto

Collaborating Authors

 Supervised Learning


Extrofitting: Enriching Word Representation and its Vector Space with Semantic Lexicons

arXiv.org Artificial Intelligence

We propose post-processing method for enriching not only word representation but also its vector space using semantic lexicons, which we call extrofitting. The method consists of 3 steps as follows: (i) Expanding 1 or more dimension(s) on all the word vectors, filling with their representative value. (ii) Transferring semantic knowledge by averaging each representative values of synonyms and filling them in the expanded dimension(s). These two steps make representations of the synonyms close together. (iii) Projecting the vector space using Linear Discriminant Analysis, which eliminates the expanded dimension(s) with semantic knowledge. When experimenting with GloVe, we find that our method outperforms Faruqui's retrofitting on some of word similarity task. We also report further analysis on our method in respect to word vector dimensions, vocabulary size as well as other well-known pretrained word vectors (e.g., Word2Vec, Fasttext).


Deep Triplet Ranking Networks for One-Shot Recognition

arXiv.org Machine Learning

Despite the breakthroughs achieved by deep learning models in conventional supervised learning scenarios, their dependence on sufficient labeled training data in each class prevents effective applications of these deep models in situations where labeled training instances for a subset of novel classes are very sparse -- in the extreme case only one instance is available for each class. To tackle this natural and important challenge, one-shot learning, which aims to exploit a set of well labeled base classes to build classifiers for the new target classes that have only one observed instance per class, has recently received increasing attention from the research community. In this paper we propose a novel end-to-end deep triplet ranking network to perform one-shot learning. The proposed approach learns class universal image embeddings on the well labeled base classes under a triplet ranking loss, such that the instances from new classes can be categorized based on their similarity with the one-shot instances in the learned embedding space. Moreover, our approach can naturally incorporate the available one-shot instances from the new classes into the embedding learning process to improve the triplet ranking model. We conduct experiments on two popular datasets for one-shot learning. The results show the proposed approach achieves better performance than the state-of-the- art comparison methods.


Exploring Partially Observed Networks with Nonparametric Bandits

arXiv.org Machine Learning

Real-world networks such as social and communication networks are too large to be observed entirely. Such networks are often partially observed such that network size, network topology, and nodes of the original network are unknown. In this paper we formalize the Adaptive Graph Exploring problem. We assume that we are given an incomplete snapshot of a large network and additional nodes can be discovered by querying nodes in the currently observed network. The goal of this problem is to maximize the number of observed nodes within a given query budget. Querying which set of nodes maximizes the size of the observed network? We formulate this problem as an exploration-exploitation problem and propose a novel nonparametric multi-arm bandit (MAB) algorithm for identifying which nodes to be queried. Our contributions include: (1) $i$KNN-UCB, a novel nonparametric MAB algorithm, applies $k$-nearest neighbor UCB to the setting when the arms are presented in a vector space, (2) provide theoretical guarantee that $i$KNN-UCB algorithm has sublinear regret, and (3) applying $i$KNN-UCB algorithm on synthetic networks and real-world networks from different domains, we show that our method discovers up to 40% more nodes compared to existing baselines.


Automatic Construction of Parallel Portfolios via Explicit Instance Grouping

arXiv.org Artificial Intelligence

Simultaneously utilizing several complementary solvers is a simple yet effective strategy for solving computationally hard problems. However, manually building such solver portfolios typically requires considerable domain knowledge and plenty of human effort. As an alternative, automatic construction of parallel portfolios (ACPP) aims at automatically building effective parallel portfolios based on a given problem instance set and a given rich design space. One promising way to solve the ACPP problem is to explicitly group the instances into different subsets and promote a component solver to handle each of them.This paper investigates solving ACPP from this perspective, and especially studies how to obtain a good instance grouping.The experimental results showed that the parallel portfolios constructed by the proposed method could achieve consistently superior performances to the ones constructed by the state-of-the-art ACPP methods,and could even rival sophisticated hand-designed parallel solvers.


KBGAN: Adversarial Learning for Knowledge Graph Embeddings

arXiv.org Artificial Intelligence

We introduce KBGAN, an adversarial learning framework to improve the performances of a wide range of existing knowledge graph embedding models. Because knowledge graphs typically only contain positive facts, sampling useful negative training examples is a non-trivial task. Replacing the head or tail entity of a fact with a uniformly randomly selected entity is a conventional method for generating negative facts, but the majority of the generated negative facts can be easily discriminated from positive facts, and will contribute little towards the training. Inspired by generative adversarial networks (GANs), we use one knowledge graph embedding model as a negative sample generator to assist the training of our desired model, which acts as the discriminator in GANs. This framework is independent of the concrete form of generator and discriminator, and therefore can utilize a wide variety of knowledge graph embedding models as its building blocks. In experiments, we adversarially train two translation-based models, TransE and TransD, each with assistance from one of the two probability-based models, DistMult and ComplEx. We evaluate the performances of KBGAN on the link prediction task, using three knowledge base completion datasets: FB15k-237, WN18 and WN18RR. Experimental results show that adversarial training substantially improves the performances of target embedding models under various settings.


ClassiNet -- Predicting Missing Features for Short-Text Classification

arXiv.org Artificial Intelligence

The fundamental problem in short-text classification is \emph{feature sparseness} -- the lack of feature overlap between a trained model and a test instance to be classified. We propose \emph{ClassiNet} -- a network of classifiers trained for predicting missing features in a given instance, to overcome the feature sparseness problem. Using a set of unlabeled training instances, we first learn binary classifiers as feature predictors for predicting whether a particular feature occurs in a given instance. Next, each feature predictor is represented as a vertex $v_i$ in the ClassiNet where a one-to-one correspondence exists between feature predictors and vertices. The weight of the directed edge $e_{ij}$ connecting a vertex $v_i$ to a vertex $v_j$ represents the conditional probability that given $v_i$ exists in an instance, $v_j$ also exists in the same instance. We show that ClassiNets generalize word co-occurrence graphs by considering implicit co-occurrences between features. We extract numerous features from the trained ClassiNet to overcome feature sparseness. In particular, for a given instance $\vec{x}$, we find similar features from ClassiNet that did not appear in $\vec{x}$, and append those features in the representation of $\vec{x}$. Moreover, we propose a method based on graph propagation to find features that are indirectly related to a given short-text. We evaluate ClassiNets on several benchmark datasets for short-text classification. Our experimental results show that by using ClassiNet, we can statistically significantly improve the accuracy in short-text classification tasks, without having to use any external resources such as thesauri for finding related features.


Adversarial Extreme Multi-label Classification

arXiv.org Machine Learning

The goal in extreme multi-label classification is to learn a classifier which can assign a small subset of relevant labels to an instance from an extremely large set of target labels. Datasets in extreme classification exhibit a long tail of labels which have small number of positive training instances. In this work, we pose the learning task in extreme classification with large number of tail-labels as learning in the presence of adversarial perturbations. This view motivates a robust optimization framework and equivalence to a corresponding regularized objective. Under the proposed robustness framework, we demonstrate efficacy of Hamming loss for tail-label detection in extreme classification. The equivalent regularized objective, in combination with proximal gradient based optimization, performs better than state-of-the-art methods on propensity scored versions of precision@k and nDCG@k(upto 20% relative improvement over PFastreXML - a leading tree-based approach and 60% relative improvement over SLEEC - a leading label-embedding approach). Furthermore, we also highlight the sub-optimality of a sparse solver in a widely used package for large-scale linear classification, which is interesting in its own right. We also investigate the spectral properties of label graphs for providing novel insights towards understanding the conditions governing the performance of Hamming loss based one-vs-rest scheme vis-\`a-vis label embedding methods.


Revisiting the Vector Space Model: Sparse Weighted Nearest-Neighbor Method for Extreme Multi-Label Classification

arXiv.org Machine Learning

Machine learning has played an important role in information retrieval (IR) in recent times. In search engines, for example, query keywords are accepted and documents are returned in order of relevance to the given query; this can be cast as a multi-label ranking problem in machine learning. Generally, the number of candidate documents is extremely large (from several thousand to several million); thus, the classifier must handle many labels. This problem is referred to as extreme multi-label classification (XMLC). In this paper, we propose a novel approach to XMLC termed the Sparse Weighted Nearest-Neighbor Method. This technique can be derived as a fast implementation of state-of-the-art (SOTA) one-versus-rest linear classifiers for very sparse datasets. In addition, we show that the classifier can be written as a sparse generalization of a representer theorem with a linear kernel. Furthermore, our method can be viewed as the vector space model used in IR. Finally, we show that the Sparse Weighted Nearest-Neighbor Method can process data points in real time on XMLC datasets with equivalent performance to SOTA models, with a single thread and smaller storage footprint. In particular, our method exhibits superior performance to the SOTA models on a dataset with 3 million labels.


Scale Up Event Extraction Learning via Automatic Training Data Generation

AAAI Conferences

The task of event extraction has long been investigated in a supervised learning paradigm, which is bound by the number and the quality of the training instances. Existing training data must be manually generated through a combination of expert domain knowledge and extensive human involvement. However, due to drastic efforts required in annotating text, the resultant datasets are usually small, which severally affects the quality of the learned model, making it hard to generalize. Our work develops an automatic approach for generating training data for event extraction. Our approach allows us to scale up event extraction training instances from thousands to hundreds of thousands, and it does this at a much lower cost than a manual approach. We achieve this by employing distant supervision to automatically create event annotations from unlabelled text using existing structured knowledge bases or tables.We then develop a neural network model with post inference to transfer the knowledge extracted from structured knowledge bases to automatically annotate typed events with corresponding arguments in text.We evaluate our approach by using the knowledge extracted from Freebase to label texts from Wikipedia articles. Experimental results show that our approach can generate a large number of highquality training instances. We show that this large volume of training data not only leads to a better event extractor, but also allows us to detect multiple typed events.


Dictionary Learning in Optimal Metric Space

AAAI Conferences

Dictionary learning has been widely used in machine learning field to address many real-world applications, such as classification and denoising. In recent years, many new dictionary learning methods have been proposed. Most of them are designed to solve unsupervised problem without any prior information or supervised problem with the label information. But in real world, as usual, we can only obtain limited side information as prior information rather than label information. The existing methods don’t take into account the side information, let alone learning a good dictionary through using the side information. To tackle it, we propose a new unified unsupervised model which naturally integrates metric learning to enhance dictionary learning model with fully utilizing the side information. The proposed method updates metric space and dictionary adaptively and alternatively, which ensures learning optimal metric space and dictionary simultaneously. Besides, our method can also deal well with highdimensional data. Extensive experiments show the efficiency of our proposed method, and a better performance can be derived in real-world image clustering applications.