Goto

Collaborating Authors

 Expert Systems


Data-Dependent Learning of Symmetric/Antisymmetric Relations for Knowledge Base Completion

AAAI Conferences

Embedding-based methods for knowledge base completion (KBC) learn representations of entities and relations in a vector space, along with the scoring function to estimate the likelihood of relations between entities. The learnable class of scoring functions is designed to be expressive enough to cover a variety of real-world relations, but this expressive comes at the cost of an increased number of parameters. In particular, parameters in these methods are superfluous for relations that are either symmetric or antisymmetric. To mitigate this problem, we propose a new L1 regularizer for Complex Embeddings, which is one of the state-of-the-art embedding-based methods for KBC. This regularizer promotes symmetry or antisymmetry of the scoring function on a relation-by-relation basis, in accordance with the observed data. Our empirical evaluation shows that the proposed method outperforms the original Complex Embeddings and other baseline methods on the FB15k dataset.


Location-Sensitive User Profiling Using Crowdsourced Labels

AAAI Conferences

In this paper, we investigate the impact of spatial variation on the construction of location-sensitive user profiles. We demonstrate evidence of spatial variation over a collection of Twitter Lists, wherein we find that crowdsourced labels are constrained by distance. For example, that energy in San Francisco is more associated with the green movement, whereas in Houston it is more associated with oil and gas. We propose a three-step framework for location-sensitive user profiling: first, it constructs a crowdsourced label similarity graph, where each labeler and labelee are annotated with a geographic coordinate; second, it transforms this similarity graph into a directed weighted tree that imposes a hierarchical structure over these labels; third, it embeds this location-sensitive folksonomy into a user profile ranking algorithm that outputs a ranked list of candidate labels for a partially observed user profile. Through extensive experiments over a Twitter list dataset, we demonstrate the effectiveness of this location-sensitive user profiling.


TipMaster: A Knowledge Base of Authoritative Local News Sources on Social Media

AAAI Conferences

Twitter has become an important online source for real-time news dissemination. Especially, official accounts of local government and media outlets have provided newsworthy and authoritative information, revealing local trends and breaking news. In this paper, we describe TipMaster an automatically constructed knowledge base of Twitter accounts that are likely to report local news, from government agencies to local media outlets. First, we implement classifiers for detecting these accounts by integrating heterogeneous information from the accounts' textual metadata, profile images, and their tweet messages. Next, we demonstrate two use cases for TipMaster: 1) as a platform that monitors real-time social media messages for local breaking news, and 2) as an authoritative source for verifying nascent rumors. Experimental results show that our account classification algorithms achieve both high precision and recall (around 90%). The demonstrated case studies prove that our platform is able to detect local breaking news or debunk emergent rumors faster than mainstream media sources.


Two Knowledge-based Methods for High-Performance Sense Distribution Learning

AAAI Conferences

Knowing the correct distribution of senses within a corpus can potentially boost the performance of Word Sense Disambiguation (WSD) systems by many points. We present two fully automatic and language-independent methods for computing the distribution of senses given a raw corpus of sentences. Intrinsic and extrinsic evaluations show that our methods outperform the current state of the art in sense distribution learning and the strongest baselines for the most frequent sense in multiple languages and on domain-specific test sets. Our sense distributions are available at http://trainomatic.org.


Knowledge-based Word Sense Disambiguation using Topic Models

AAAI Conferences

Word Sense Disambiguation is an open problem in Natural Language Processing which is particularly challenging and useful in the unsupervised setting where all the words in any given text need to be disambiguated without using any labeled data. Typically WSD systems use the sentence or a small window of words around the target word as the context for disambiguation because their computational complexity scales exponentially with the size of the context. In this paper, we leverage the formalism of topic model to design a WSD system that scales linearly with the number of words in the context. As a result, our system is able to utilize the whole document as the context for a word to be disambiguated. The proposed method is a variant of Latent Dirichlet Allocation in which the topic proportions for a document are replaced by synset proportions. We further utilize the information in the WordNet by assigning a non-uniform prior to synset distribution over words and a logistic-normal prior for document distribution over synsets. We evaluate the proposed method on Senseval-2, Senseval-3, SemEval-2007, SemEval-2013 and SemEval-2015 English All-Word WSD datasets and show that it outperforms the state-of-the-art unsupervised knowledge-based WSD system by a significant margin.


Neural Knowledge Acquisition via Mutual Attention Between Knowledge Graph and Text

AAAI Conferences

We propose a general joint representation learning framework for knowledge acquisition (KA) on two tasks, knowledge graph completion (KGC) and relation extraction (RE) from text. In this framework, we learn representations of knowledge graphs (KGs) and text within a unified parameter sharing semantic space. To achieve better fusion, we propose an effective mutual attention between KGs and text. The reciprocal attention mechanism enables us to highlight important features and perform better KGC and RE. Different from conventional joint models, no complicated linguistic analysis or strict alignments between KGs and text are required to train our models. Experiments on relation extraction and entity link prediction show that models trained under our joint framework are significantly improved in comparison with other baselines. Most existing methods for KGC and RE can be easily integrated into our framework due to its flexible architectures. The source code of this paper can be obtained from https://github.com/thunlp/JointNRE.


Approximate and Exact Enumeration of Rule Models

AAAI Conferences

In machine learning, rule models are one of the most popular choices when model interpretability is the primary concern. Ordinary, a single model is obtained by solving an optimization problem, and the resulting model is interpreted as the one that best explains the data. In this study, instead of finding a single rule model, we propose algorithms for enumerating multiple rule models. Model enumeration is useful in practice when (i) users want to choose a model that is particularly suited to their task knowledge, or (ii) users want to obtain several possible mechanisms that could be underlying the data to use as hypotheses for further scientific studies. To this end, we propose two enumeration algorithms: an approximate algorithm and an exact algorithm. We prove that these algorithms can enumerate models in a descending order of their objective function values approximately and exactly. We then confirm our theoretical results through experiments on real-world data. We also show that, by using the proposed enumeration algorithms, we can find several different models of almost equal quality.


Embedding of Hierarchically Typed Knowledge Bases

AAAI Conferences

Embedding has emerged as an important approach to prediction, inference, data mining and information retrieval based on knowledge bases and various embedding models have been presented. Most of these models are "typeless," namely, treating a knowledge base solely as a collection of instances without considering the types of the entities therein. In this paper, we investigate the use of entity type information for knowledge base embedding. We present a framework that augments a generic "typeless" embedding model to a typed one. The framework interprets an entity type as a constraint on the set of all entities and let these type constraints induce isomorphically a set of subsets in the embedding space. Additional cost functions are then introduced to model the fitness between these constraints and the embedding of entities and relations. A concrete example scheme of the framework is proposed. We demonstrate experimentally that this framework offers improved embedding performance over the typeless models and other typed models.


An overview of embedding models of entities and relationships for knowledge base completion

arXiv.org Artificial Intelligence

Knowledge bases (KBs) of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks. However, because knowledge bases are typically incomplete, it is useful to be able to perform knowledge base completion or link prediction, i.e., predict whether a relationship not in the knowledge base is likely to be true. This article serves as a brief overview of embedding models of entities and relationships for knowledge base completion, summarizing up-to-date experimental results on standard benchmark datasets FB15k, WN18, FB15k-237, WN18RR, FB13 and WN11.


Modeling Variations of First-Order Horn Abduction in Answer Set Programming

arXiv.org Artificial Intelligence

We study abduction in First Order Horn logic theories where all atoms can be abduced and we are looking for preferred solutions with respect to three objective functions: cardinality minimality, coherence, and weighted abduction. We represent this reasoning problem in Answer Set Programming (ASP), in order to obtain a flexible framework for experimenting with global constraints and objective functions, and to test the boundaries of what is possible with ASP. Realizing this problem in ASP is challenging as it requires value invention and equivalence between certain constants, because the Unique Names Assumption does not hold in general. To permit reasoning in cyclic theories, we formally describe fine-grained variations of limiting Skolemization. We identify term equivalence as a main instantiation bottleneck, and improve the efficiency of our approach with on-demand constraints that were used to eliminate the same bottleneck in state-of-the-art solvers. We evaluate our approach experimentally on the ACCEL benchmark for plan recognition in Natural Language Understanding. Our encodings are publicly available, modular, and our approach is more efficient than state-of-the-art solvers on the ACCEL benchmark.