Wikipedia provides an enormous amount of background knowledge to reason about the semantic relatedness between two entities. We propose Wikipedia-based Distributional Semantics for Entity Relatedness (DiSER), which represents the semantics of an entity by its distribution in the high dimensional concept space derived from Wikipedia. DiSER measures the semantic relatedness between two entities by quantifying the distance between the corresponding high-dimensional vectors. DiSER builds the model by taking the annotated entities only, therefore it improves over existing approaches, which do not distinguish between an entity and its surface form. We evaluate the approach on a benchmark that contains the relative entity relatedness scores for 420 entity pairs. Our approach improves the accuracy by 12% on state of the art methods for computing entity relatedness. We also show an evaluation of DiSER in the Entity Disambiguation task on a dataset of 50 sentences with highly ambiguous entity mentions. It shows an improvement of 10% in precision over the best performing methods. In order to provide the resource that can be used to find out all the related entities for a given entity, a graph is constructed, where the nodes represent Wikipedia entities and the relatedness scores are reflected by the edges. Wikipedia contains more than 4.1 millions entities, which required efficient computation of the relatedness scores between the corresponding 17 trillions of entity-pairs.
In this paper we present an approach aimed at enriching the Open Information Extraction paradigm with semantic relation ontologization by integrating syntactic and semantic features into its workflow. To achieve this goal, we combine deep syntactic analysis and distributional semantics using a shortest path kernel method and soft clustering. The output of our system is a set of automatically discovered and ontologized semantic relations.
Zhou, Guangyou (Chinese Academy of Sciences) | Liu, Yang (Chinese Academy of Sciences) | Liu, Fang (Chinese Academy of Sciences) | Zeng, Daojian (Institute of Automation, Chinese Academy of Sciences) | Zhao, Jun (Chinese Academy of Sciences)
Community question answering (cQA), which providesa platform for people with diverse backgroundto share information and knowledge, hasbecome an increasingly popular research topic. Inthis paper, we focus on the task of question retrieval.The key problem of question retrieval is tomeasure the similarity between the queried questionsand the historical questions which have beensolved by other users. The traditional methodsmeasure the similarity based on the bag-of-words(BOWs) representation. This representation neithercaptures dependencies between related words, norhandles synonyms or polysemous words. In thiswork, we first propose a way to build a conceptthesaurus based on the semantic relations extractedfrom the world knowledge of Wikipedia. Then, wedevelop a unified framework to leverage these semanticrelations in order to enhance the questionsimilarity in the concept space. Experiments conductedon a real cQA data set show that with thehelp of Wikipedia thesaurus, the performance ofquestion retrieval is improved as compared to thetraditional methods.
Bollegala, Danushka (The University of Liverpool) | Maehara, Takanori (National Institute of Informatics) | Yoshida, Yuichi (National Institute of Informatics) | Kawarabayashi, Ken-ichi (National Institute of Informatics)
Attributes of words and relations between two words are central to numerous tasks in Artificial Intelligence such as knowledge representation, similarity measurement, and analogy detection. Often when two words share one or more attributes in common, they are con- nected by some semantic relations. On the other hand, if there are numerous semantic relations between two words, we can expect some of the attributes of one of the words to be inherited by the other. Motivated by this close connection between attributes and relations, given a relational graph in which words are inter-connected via numerous semantic relations, we propose a method to learn a latent representation for the individual words. The proposed method considers not only the co-occurrences of words as done by existing approaches for word representation learning, but also the semantic relations in which two words co-occur. To evaluate the accuracy of the word representations learnt using the proposed method, we use the learnt word representa- tions to solve semantic word analogy problems. Our experimental results show that it is possible to learn better word representations by using semantic semantics between words.
Human languages are naturally ambiguous, which makes it difficult to automatically understand the semantics of text. Most vector space models (VSM) treat all occurrences of a word as the same and build a single vector to represent the meaning of a word, which fails to capture any ambiguity. We present sense-aware semantic analysis (SaSA), a multi-prototype VSM for word representation based on Wikipedia, which could account for homonymy and polysemy. The "sense-specific'' prototypes of a word are produced by clustering Wikipedia pages based on both local and global contexts of the word in Wikipedia. Experimental evaluations on semantic relatedness for both isolated words and words in sentential contexts and word sense induction demonstrate its effectiveness.