Goto

Collaborating Authors

 National Institute of Informatics


Using k-Way Co-Occurrences for Learning Word Embeddings

AAAI Conferences

Co-occurrences between two words provide useful insights into the semantics of those words.Consequently, numerous prior work on word embedding learning has used co-occurrences between two wordsas the training signal for learning word embeddings.However, in natural language texts it is common for multiple words to be related and co-occurring in the same context.We extend the notion of co-occurrences to cover k (≥2)-way co-occurrences among a set of k- words.Specifically, we prove a theoretical relationship between the joint probability of k (≥2) words, and the sum of l_2 norms of their embeddings. Next, we propose a learning objective motivated by our theoretical resultthat utilises k- way co-occurrences for learning word embeddings.Our experimental results show that the derived theoretical relationship does indeed hold empirically, anddespite data sparsity, for some smaller k (≤5) values, k- way embeddings perform comparably or better than 2-way embeddings in a range of tasks.


Exact Clustering via Integer Programming and Maximum Satisfiability

AAAI Conferences

We consider the following general graph clustering problem: given a complete undirected graph G=(V,E,c) with an edge weight function c:E->Q, we are asked to find a partition C of V that maximizes the sum of edge weights within the clusters in C. Owing to its high generality, this problem has a wide variety of real-world applications, including correlation clustering, group technology, and community detection. In this study, we investigate the design of mathematical programming formulations and constraint satisfaction formulations for the problem. First, we present a novel integer linear programming (ILP) formulation that has far fewer constraints than the standard ILP formulation by Groetschel and Wakabayashi (1989). Second, we propose an ILP-based exact algorithm that solves an ILP problem obtained by modifying our above ILP formulation and then performs simple post-processing to produce an optimal solution to the original problem. Third, we present maximum satisfiability (MaxSAT) counterparts of both our ILP formulation and ILP-based exact algorithm. Computational experiments using well-known real-world datasets demonstrate that our ILP-based approaches and their MaxSAT counterparts are highly effective in terms of both memory efficiency and computation time.


TorusE: Knowledge Graph Embedding on a Lie Group

AAAI Conferences

Knowledge graphs are useful for many artificial intelligence (AI) tasks. However, knowledge graphs often have missing facts. To populate the graphs, knowledge graph embedding models have been developed. Knowledge graph embedding models map entities and relations in a knowledge graph to a vector space and predict unknown triples by scoring candidate triples. TransE is the first translation-based method and it is well known because of its simplicity and efficiency for knowledge graph completion. It employs the principle that the differences between entity embeddings represent their relations. The principle seems very simple, but it can effectively capture the rules of a knowledge graph. However, TransE has a problem with its regularization. TransE forces entity embeddings to be on a sphere in the embedding vector space. This regularization warps the embeddings and makes it difficult for them to fulfill the abovementioned principle. The regularization also affects adversely the accuracies of the link predictions. On the other hand, regularization is important because entity embeddings diverge by negative sampling without it. This paper proposes a novel embedding model, TorusE, to solve the regularization problem. The principle of TransE can be defined on any Lie group. A torus, which is one of the compact Lie groups, can be chosen for the embedding space to avoid regularization. To the best of our knowledge, TorusE is the first model that embeds objects on other than a real or complex vector space, and this paper is the first to formally discuss the problem of regularization of TransE. Our approach outperforms other state-of-the-art approaches such as TransE, DistMult and ComplEx on a standard link prediction task. We show that TorusE is scalable to large-size knowledge graphs and is faster than the original TransE.


Video-Based Person Re-Identification via Self Paced Weighting

AAAI Conferences

Person re-identification (re-id) is a fundamental technique to associate various person images, captured by differentsurveillance cameras, to the same person. Compared to the single image based person re-id methods, video-based personre-id has attracted widespread attentions because extra space-time information and more appearance cues that can beused to greatly improve the matching performance. However, most existing video-based person re-id methods equally treatall video frames, ignoring their quality discrepancy caused by object occlusion and motions, which is a common phenomenonin real surveillance scenario. Based on this finding, we propose a novel video-based person re-id method via self paced weighting (SPW). Firstly, we propose a self paced outlier detection method to evaluate the noise degree of video sub sequences. Thereafter, a weighted multi-pair distance metric learning approach is adopted to measure the distance of two person image sequences. Experimental results on two public datasets demonstrate the superiority of the proposed method over current state-of-the-art work.


Prerequisite Skills for Reading Comprehension: Multi-Perspective Analysis of MCTest Datasets and Systems

AAAI Conferences

One of the main goals of natural language processing (NLP) is synthetic understanding of natural language documents, especially reading comprehension (RC). An obstacle to the further development of RC systems is the absence of a synthetic methodology to analyze their performance. It is difficult to examine the performance of systems based solely on their results for tasks because the process of natural language understanding is complex. In order to tackle this problem, we propose in this paper a methodology inspired by unit testing in software engineering that enables the examination of RC systems from multiple aspects. Our methodology consists of three steps. First, we define a set of prerequisite skills for RC based on existing NLP tasks. We assume that RC capability can be divided into these skills. Second, we manually annotate a dataset for an RC task with information regarding the skills needed to answer each question. Finally, we analyze the performance of RC systems for each skill based on the annotation. The last two steps highlight two aspects: the characteristics of the dataset, and the weaknesses in and differences among RC systems. We tested the effectiveness of our methodology by annotating the Machine Comprehension Test (MCTest) dataset and analyzing four existing systems (including a neural system) on it. The results of the annotations showed that answering questions requires a combination of skills, and clarified the kinds of capabilities that systems need to understand natural language. We conclude that the set of prerequisite skills we define are promising for the decomposition and analysis of RC.


Optimal Pricing for Submodular Valuations with Bounded Curvature

AAAI Conferences

The optimal pricing problem is a fundamental problem that arises in combinatorial auctions. Suppose that there is one seller who has indivisible items and multiple buyers who want to purchase a combination of the items. The seller wants to sell his items for the highest possible prices, and each buyer wants to maximize his utility (i.e., valuation minus payment) as long as his payment does not exceed his budget. The optimal pricing problem seeks a price of each item and an assignment of items to buyers such that every buyer achieves the maximum utility under the prices. The goal of the problem is to maximize the total payment from buyers. In this paper, we consider the case that the valuations are submodular. We show that the problem is computationally hard even if there exists only one buyer. Then we propose approximation algorithms for the unlimited budget case. We also extend the algorithm for the limited budget case when there exists one buyer and multiple buyers collaborate with each other.


Enumerate Lasso Solutions for Feature Selection

AAAI Conferences

We propose an algorithm for enumerating solutions to the Lasso regression problem.In ordinary Lasso regression, one global optimum is obtained and the resulting features are interpreted as task-relevant features.However, this can overlook possibly relevant features not selected by the Lasso.With the proposed method, we can enumerate many possible feature sets for human inspection, thus recording all the important features.We prove that by enumerating solutions, we can recover a true feature set exactly under less restrictive conditions compared with the ordinary Lasso.We confirm our theoretical results also in numerical simulations.Finally, in the gene expression and the text data, we demonstrate that the proposed method can enumerate a wide variety of meaningful feature sets, which are overlooked by the global optima.


Scalable Algorithm for Higher-Order Co-Clustering via Random Sampling

AAAI Conferences

We propose a scalable and efficient algorithm for coclustering a higher-order tensor. Viewing tensors with hypergraphs, we propose formulating the co-clustering of a tensor as a problem of partitioning the corresponding hypergraph. Our algorithm is based on the random sampling technique, which has been successfully applied to graph cut problems. We extend a random sampling algorithm for the graph multiwaycut problem to hypergraphs, and design a co-clustering algorithm based on it. Each iteration of our algorithm runs in polynomial on the size of hypergraphs, and thus it performs well even for higher-order tensors, which are difficult to deal with for state-of-the-art algorithm.


Expected Tensor Decomposition with Stochastic Gradient Descent

AAAI Conferences

In this study, we investigate expected CP decomposition — a special case of CP decomposition in which a tensor to be decomposed is given as the sum or average of tensor samples X ( t ) for t = 1,..., T . To determine this decomposition, we develope stochastic-gradient-descent-type algorithms with four appealing features: efficient memory use, ability to work in an online setting, robustness of parameter tuning, and simplicity. Our theoretical analysis show that the solutions do not diverge to infinity for any initial value or step size. Experimental results confirm that our algorithms significantly outperform all existing methods in terms of accuracy. We also show that they can successfully decompose a large tensor, containing billion-scale nonzero elements.


Joint Word Representation Learning Using a Corpus and a Semantic Lexicon

AAAI Conferences

Methods for learning word representations using large text corpora have received much attention lately due to their impressive performancein numerous natural language processing (NLP) tasks such as, semantic similarity measurement, and word analogy detection.Despite their success, these data-driven word representation learning methods do not considerthe rich semantic relational structure between words in a co-occurring context. On the other hand, already much manual effort has gone into the construction of semantic lexicons such as the WordNetthat represent the meanings of words by defining the various relationships that exist among the words in a language.We consider the question, can we improve the word representations learnt using a corpora by integrating theknowledge from semantic lexicons?. For this purpose, we propose a joint word representation learning method that simultaneously predictsthe co-occurrences of two words in a sentence subject to the relational constrains given by the semantic lexicon.We use relations that exist between words in the lexicon to regularize the word representations learnt from the corpus.Our proposed method statistically significantly outperforms previously proposed methods for incorporating semantic lexicons into wordrepresentations on several benchmark datasets for semantic similarity and word analogy.