Goto

Collaborating Authors

 Semantic Networks


Learning Multilingual Word Representations using a Bag-of-Words Autoencoder

arXiv.org Machine Learning

Recent work on learning multilingual word representations usually relies on the use of word-level alignements (e.g. infered with the help of GIZA++) between translated sentences, in order to align the word embeddings in different languages. In this workshop paper, we investigate an autoencoder model for learning multilingual word representations that does without such word-level alignements. The autoencoder is trained to reconstruct the bag-of-word representation of given sentence from an encoded representation extracted from its translation. We evaluate our approach on a multilingual document classification task, where labeled data is available only for one language (e.g. English) while classification must be performed in a different language (e.g. French). In our experiments, we observe that our method compares favorably with a previously proposed method that exploits word-level alignments to learn word representations.


Large-Scale Knowledge Graph Identification Using PSL

AAAI Conferences

Large-scale information processing systems are able to extract massive collections of interrelated facts, but unfortunately transforming these candidate facts into useful knowledge is a formidable challenge. In this paper, we show how uncertain extractions about entities and their relations can be transformed into a knowledge graph. The extractions form an extraction graph and we refer to the task of removing noise, inferring missing information, and determining which candidate facts should be included into a knowledge graph as knowledge graph identification. In order to perform this task, we must reason jointly about candidate facts and their associated extraction confidences, identify co-referent entities, and incorporate ontological constraints. Our proposed approach uses probabilistic soft logic (PSL), a recently introduced probabilistic modeling framework which easily scales to millions of facts. We demonstrate the power of our method on a real-world set of extractions from the NELL project containing over 1M extractions and 70K ontological relations. We show that compared to existing methods, our approach is able to achieve improved AUC and F1 with significantly lower running time.


Comparing and Evaluating Semantic Data Automatically Extracted from Text

AAAI Conferences

One way to obtain large amounts of semantic data is to extract facts from the vast quantities of text that is now available on-line. The relatively low accuracy of current information extraction techniques introduces a need for evaluating the quality of the knowledge bases (KBs) they generate. We frame the problem as comparing KBs generated by different systems from the same documents and show that exploiting provenance leads to more efficient techniques for aligning them and identifying their differences. We describe two types of tools: entity-match focuses on differences in entities found and linked; kbdiff focuses on differences in relations among those entities. Together, these tools support assessment of relative KB accuracy by sampling the parts of two KBs that disagree. We explore the usefulness of the tools through the construction of tens of different KBs built from the same 26,000 Washington Post articles and identifying the differences.


Uncertain and Approximative Knowledge Representation to Reasoning on Classification with a Fuzzy Networks Based System

arXiv.org Artificial Intelligence

The approach described here allows to use the fuzzy Object Based Representation of imprecise and uncertain knowledge. This representation has a great practical interest due to the possibility to realize reasoning on classification with a fuzzy semantic network based system. For instance, the distinction between necessary, possible and user classes allows to take into account exceptions that may appear on fuzzy knowledge-base and facilitates integration of user's Objects in the base. This approach describes the theoretical aspects of the architecture of the whole experimental A.I. system we built in order to provide effective on-line assistance to users of new technological systems: the understanding of "how it works" and "how to complete tasks" from queries in quite natural languages. In our model, procedural semantic networks are used to describe the knowledge of an "ideal" expert while fuzzy sets are used both to describe the approximative and uncertain knowledge of novice users in fuzzy semantic networks which intervene to match fuzzy labels of a query with categories from our "ideal" expert.


Automatic Discovery of Fuzzy Synsets from Dictionary Definitions

AAAI Conferences

In order to deal with ambiguity in natural language, it is common to organise words, according to their senses, in synsets, which are groups of synonymous words that can be seen as concepts. The manual creation of a broad-coverage synset base is a time-consuming task, so we take advantage of dictionary definitions for extracting synonymy pairs and clustering for identifying synsets. Since word senses are not discrete, we create fuzzy synsets, where each word has a membership degree. We report on the results of the creation of a fuzzy synset base for Portuguese, from three electronic dictionaries. The resulting resource is larger than existing hancrafted Portuguese thesauri.


Impact of Word Sense Disambiguation on Ordering Dictionary Definitions in Vocabulary Learning Tutors

AAAI Conferences

Past research has shown that dictionaries and glosses can be beneficial in computer assisted language learning, particularly in vocabulary learning. We propose that L2 vocabulary learners can benefit from the use of a dictionary whose definitions are sensitive to the provided reading context, and that advances in the natural language processing task of word sense disambiguation can be used to automatically order the definitions of such a dictionary. An in-vivo study was conducted with ESL students to investigate the effect that the order of definitions has on vocabulary learning using REAP, a computer based vocabulary tutor. Our results showed that students benefited from having the algorithmically determined best definitions listed at the top of the definition list. Furthermore, our results suggest that word sense disambiguation may currently be good enough for use in intelligent language tutoring environments.


Grammar-Based Random Walkers in Semantic Networks

arXiv.org Artificial Intelligence

Semantic networks qualify the meaning of an edge relating any two vertices. Determining which vertices are most "central" in a semantic network is difficult because one relationship type may be deemed subjectively more important than another. For this reason, research into semantic network metrics has focused primarily on context-based rankings (i.e. user prescribed contexts). Moreover, many of the current semantic network metrics rank semantic associations (i.e. directed paths between two vertices) and not the vertices themselves. This article presents a framework for calculating semantically meaningful primary eigenvector-based metrics such as eigenvector centrality and PageRank in semantic networks using a modified version of the random walker model of Markov chain analysis. Random walkers, in the context of this article, are constrained by a grammar, where the grammar is a user defined data structure that determines the meaning of the final vertex ranking. The ideas in this article are presented within the context of the Resource Description Framework (RDF) of the Semantic Web initiative.


Meaning and Links

AI Magazine

This article presents some fundamental ideas about representing knowledge and dealing with meaning in computer representations. I will describe the issues as I currently understand them and describe how they came about, how they fit together, what problems they solve, and some of the things that the resulting framework can do. The ideas apply not just to graph-structured "node-and-link" representations, sometimes called semantic networks, but also to representations referred to variously as frames with slots, entities with relationships, objects with attributes, tables with columns, and records with fields and to the classes and variables of object-oriented data structures. I will start by describing some background experiences and thoughts that preceded the writing of my 1975 paper, "What's in a Link," which introduced many of these issues. After that, I will present some of the key ideas from that paper with a discussion of how some of those ideas have matured since then. Finally, I will describe some practical applications of these ideas in the context of knowledge access and information retrieval and will conclude with some thoughts about where I think we can go from here.


Combining Knowledge- and Corpus-based Word-Sense-Disambiguation Methods

Journal of Artificial Intelligence Research

In this paper we concentrate on the resolution of the lexical ambiguity that arises when a given word has several different meanings. This specific task is commonly referred to as word sense disambiguation (WSD). The task of WSD consists of assigning the correct sense to words using an electronic dictionary as the source of word definitions. We present two WSD methods based on two main methodological approaches in this research area: a knowledge-based method and a corpus-based method. Our hypothesis is that word-sense disambiguation requires several knowledge sources in order to solve the semantic ambiguity of the words. These sources can be of different kinds--- for example, syntagmatic, paradigmatic or statistical information. Our approach combines various sources of knowledge, through combinations of the two WSD methods mentioned above. Mainly, the paper concentrates on how to combine these methods and sources of information in order to achieve good results in the disambiguation. Finally, this paper presents a comprehensive study and experimental work on evaluation of the methods and their combinations.


WORDNET: A Lexical Database for English

Classics

Access to restricted content on Oxford Academic is often provided through institutional subscriptions and purchases. Typically, access is provided across an institutional network to a range of IP addresses. This authentication occurs automatically, and it is not possible to sign out of an IP authenticated account. Choose this option to get remote access when outside your institution. Shibboleth / Open Athens technology is used to provide single sign-on between your institution's website and Oxford Academic.