Goto

Collaborating Authors

 probase


Towards a Framework for Visual Intelligence in Service Robotics: Epistemic Requirements and Gap Analysis

arXiv.org Artificial Intelligence

A key capability required by service robots operating in real-world, dynamic environments is that of Visual Intelligence, i.e., the ability to use their vision system, reasoning components and background knowledge to make sense of their environment. In this paper, we analyze the epistemic requirements for Visual Intelligence, both in a top-down fashion, using existing frameworks for human-like Visual Intelligence in the literature, and from the bottom up, based on the errors emerging from object recognition trials in a real-world robotic scenario. Finally, we use these requirements to evaluate current knowledge bases for Service Robotics and to identify gaps in the support they provide for Visual Intelligence. These gaps provide the basis of a research agenda for developing more effective knowledge representations for Visual Intelligence.


Beyond Word Embeddings: Learning Entity and Concept Representations from Large Scale Knowledge Bases

arXiv.org Artificial Intelligence

Text representations using neural word embeddings have proven effective in many NLP applications. Recent researches adapt the traditional word embedding models to learn vectors of multiword expressions (concepts/entities). However, these methods are limited to textual knowledge bases (e.g., Wikipedia). In this paper, we propose a novel and simple technique for integrating the knowledge about concepts from two large scale knowledge bases of different structure (Wikipedia, and Probase) in order to learn concept representations. We adapt the efficient skip-gram model to seamlessly learn from the knowledge in Wikipedia text and Probase concept graph. We evaluate our concept embedding models on two tasks: 1) analogical reasoning, where we achieve a stateof-the-art performance of 91% on semantic analogies, 2) concept categorization, where we achieve a state-of-the-art performance on two benchmark datasets achieving categorization accuracy of 100% on one and 98% on the other. Additionally, we present a case study to evaluate our model on unsupervised argument type identification for neural semantic parsing. We demonstrate the competitive accuracy of our unsupervised method and its ability to better generalize to out of vocabulary entity mentions compared to the tedious and error prone methods which depend on gazetteers and regular expressions. In this paper, we use the terms "concept" and "entity" interchangeably. Hongxia Jin Samsung Research America 665 Clyde Avenue, Mountain View, CA 94043, USA Email: hongxia.jin@samsung.com 2 Walid Shalaby et al. Figure 1 Integrating knowledge from Wikipedia text (left) and Probase concept graph (right). Local concept-concept, concept-word, and word-word contexts are generated from both KBs and used for training the skip-gram model.


Error Detection in a Large-Scale Lexical Taxonomy

arXiv.org Artificial Intelligence

Knowledge base (KB) is an important aspect in artificial intelligence. One significant challenge faced by KB construction is that it contains many noises, which prevents its effective usage. Even though some KB cleansing algorithms have been proposed, they focus on the structure of the knowledge graph and neglect the relation between the concepts, which could be helpful to discover wrong relations in KB. Motived by this, we measure the relation of two concepts by the distance between their corresponding instances and detect errors within the intersection of the conflicting concept sets. For efficient and effective knowledge base cleansing, we first apply a distance-based Model to determine the conflicting concept sets using two different methods. Then, we propose and analyze several algorithms on how to detect and repairing the errors based on our model, where we use hash method for an efficient way to calculate distance. Experimental results demonstrate that the proposed approaches could cleanse the knowledge bases efficiently and effectively.


On the Transitivity of Hypernym-Hyponym Relations in Data-Driven Lexical Taxonomies

AAAI Conferences

Taxonomy is indispensable in understanding natural language. A variety of large scale, usage-based, data-driven lexical taxonomies have been constructed in recent years.Hypernym-hyponym relationship, which is considered as the backbone of lexical taxonomies can not only be used to categorize the data but also enables generalization. In particular, we focus on one of the most prominent properties of the hypernym-hyponym relationship, namely, transitivity, which has a significant implication for many applications. We show that, unlike human crafted ontologies and taxonomies, transitivity does not always hold in data-drivenlexical taxonomies. We introduce a supervised approach to detect whether transitivity holds for any given pair of hypernym-hyponym relationships. Besides solving the inferencing problem, we also use the transitivity to derive new hypernym-hyponym relationships for data-driven lexical taxonomies. We conduct extensive experiments to show the effectiveness of our approach.


Graph-Based Wrong IsA Relation Detection in a Large-Scale Lexical Taxonomy

AAAI Conferences

Knowledge base(KB) plays an important role in artificial intelligence. Much effort has been taken to both manually and automatically construct web-scale knowledge bases. Comparing with manually constructed KBs, automatically constructed KB is broader but with more noises. In this paper, we study the problem of improving the quality for automatically constructed web-scale knowledge bases, in particular, lexical taxonomies of isA relationships. We find that these taxonomies usually contain cycles, which are often introduced by incorrect isA relations. Inspired by this observation, we introduce two kinds of models to detect incorrect isA relations from cycles. The first one eliminates cycles by extracting directed acyclic graphs, and the other one eliminates cycles by grouping nodes into different levels. We implement our models on Probase, a state-of-the-art, automatically constructed, web-scale taxonomy. After processing tens of millions of relations, our models eliminate 74 thousand wrong relations with 91% accuracy.


Microsoft quietly delivers first preview of Graph Engine ZDNet

AITopics Original Links

It's been quite a while since there's been word regarding Microsoft Research's "Project Trinity," its graph database and computing platform. Graph Engine is a distributed, in-memory, large graph processing engine. It's a general-purpose computation engine that provides a unified declarative language for data modeling and message passing. It can be integrated with other system stacks via user-defined programming interfaces and RESTful interfaces. "Trinity (Graph Engine) supports online query processing and offline analytics on large graphs," explained a Microsoft Research page about the project.


Fine-Grained Semantic Conceptualization of FrameNet

AAAI Conferences

Understanding verbs is essential for many natural language tasks. Tothis end, large-scale lexical resources such as FrameNet have beenmanually constructed to annotate the semantics of verbs (frames) andtheir arguments (frame elements or FEs) in example sentences.Our goal is to "semantically conceptualize" example sentences by connectingFEs to knowledge base (KB) concepts.For example, connecting Employer FE to company concept in the KB enables the understanding thatany (unseen) company can also be FE examples.However, a naive adoption of existing KB conceptualization technique, focusingon scenarios of conceptualizing a few terms,cannot 1) scale to many FE instances (average of 29.7 instances for all FEs) and 2) leverage interdependence betweeninstances and concepts.We thus propose a scalable k-truss clusteringand a Markov Random Field (MRF) model leveraging interdependence betweenconcept-instance, concept-concept, and instance-instance pairs. Our extensive analysis with real-life data validates that our approachimproves not only the quality of the identified concepts for FrameNet, but alsothat of applications such as selectional preference.


Verb Pattern: A Probabilistic Semantic Representation on Verbs

AAAI Conferences

Verbs are important in semantic understanding of natural language. Traditional verb representations, such as FrameNet, PropBank, VerbNet, focus on verbs' roles. These roles are too coarse to represent verbs' semantics. In this paper, we introduce verb patterns to represent verbs' semantics, such that each pattern corresponds to a single semantic of the verb. First we analyze the principles for verb patterns: generality and specificity. Then we propose a nonparametric model based on description length. Experimental results prove the high effectiveness of verb patterns. We further apply verb patterns to context-aware conceptualization, to show that verb patterns are helpful in semantic-related tasks.


Query Understanding through Knowledge-Based Conceptualization

AAAI Conferences

The goal of query conceptualization is to map instances in a query to concepts defined in a certain ontology or knowledge base. Queries usually do not observe the syntax of a written language, nor do they contain enough signals for statistical inference. However, the available context, i.e., the verbs related to the instances, the adjectives and attributes of the instances, do provide valuable clues to understand instances. In this paper, we first mine a variety of relations among terms from a large web corpus and map them to related concepts using a probabilistic knowledge base. Then, for a given query, we conceptualize terms in the query using a random walk based iterative algorithm. Finally, we examine our method on real data and compare it to representative previous methods. The experimental results show that our method achieves higher accuracy and efficiency in query conceptualization.


On Conceptual Labeling of a Bag of Words

AAAI Conferences

In natural language processing and information retrieval, the bag of words representation is used to implicitly represent the meaning of the text. Implicit semantics, however, are insufficient in supporting text or natural language based interfaces, which are adopted by an increasing number of applications. Indeed, in applications ranging from automatic ontology construction to question answering, explicit representation of semantics is starting to play a more prominent role. In this paper, we introduce the task of conceptual labeling (CL), which aims at generating a minimum set of conceptual labels that best summarize a bag of words. We draw the labels from a data driven semantic network that contains millions of highly connected concepts. The semantic network provides meaning to the concepts, and in turn, it provides meaning to the bag of words through the conceptual labels we generate. To achieve our goal, we use an information theoretic approach to trade-off the semantic coverage of a bag of words against the minimality of the output labels. Specifically, we use Minimum Description Length (MDL) as the criteria in selecting the best concepts. Our extensive experimental results demonstrate the effectiveness of our approach in representing the explicit semantics of a bag of words.