Goto

Collaborating Authors

 Grammars & Parsing


SKOS Concepts and Natural Language Concepts: an Analysis of Latent Relationships in KOSs

arXiv.org Artificial Intelligence

The vehicle to represent Knowledge Organization Systems (KOSs) in the environment of the Semantic Web and linked data is the Simple Knowledge Organization System (SKOS). SKOS provides a way to assign a URI to each concept, and this URI functions as a surrogate for the concept. This fact makes of main concern the need to clarify the URIs' ontological meaning. The aim of this study is to investigate the relation between the ontological substance of KOS concepts and concepts revealed through the grammatical and syntactic formalisms of natural language. For this purpose, we examined the dividableness of concepts in specific KOSs (i.e. a thesaurus, a subject headings system and a classification scheme) by applying Natural Language Processing (NLP) techniques (i.e. morphosyntactic analysis) to the lexical representations (i.e. RDF literals) of SKOS concepts. The results of the comparative analysis reveal that, despite the use of multi-word units, thesauri tend to represent concepts in a way that can hardly be further divided conceptually, while Subject Headings and Classification Schemes - to a certain extent - comprise terms that can be decomposed into more conceptual constituents. Consequently, SKOS concepts deriving from thesauri are more likely to represent atomic conceptual units and thus be more appropriate tools for inference and reasoning. Since identifiers represent the meaning of a concept, complex concepts are neither the most appropriate nor the most efficient way of modelling a KOS for the Semantic Web.


Genetic Programming (Machine Learning/AI): "Santa Fe Trail" problem - Syntax Trees

#artificialintelligence

The syntax tree of the fittest individual is shown for each generation until a solution with perfect fitness is found - and beyond. Not too exciting for a small function/terminal set and a program size limit of 50 instructions but there you go! For details on the "Santa Fe Trail problem" please see https://en.wikipedia.org/wiki/Santa_F...


Four deep learning trends from ACL 2017

@machinelearnbot

"NLP is booming", declared Joakim Nivre at the presidential address of ACL 2017, which I attended in Vancouver earlier this month. As evidenced by the throngs of attendees, interest in NLP is at an all-time high – an increase that is chiefly due to the successes of the deep learning renaissance, which recently swept like a tidal wave over the field. Beneath the optimism however, I noticed a tangible anxiety at ACL, as one field adjusts to its rapid transformation by another. Researchers asked whether there is anything of the old NLP left – or was it all swept away by the tidal wave? Are neural networks the only technique we need any more?


Natural Language Processing: State of The Art, Current Trends and Challenges

arXiv.org Artificial Intelligence

Natural language processing (NLP) has recently gained much attention for representing and analysing human language computationally. It has spread its applications in various fields such as machine translation, email spam detection, information extraction, summarization, medical, and question answering etc. The paper distinguishes four phases by discussing different levels of NLP and components of Natural Language Generation (NLG) followed by presenting the history and evolution of NLP, state of the art presenting the various applications of NLP and current trends and challenges.


How to make a racist AI without really trying

#artificialintelligence

Recognizing whether people are expressing positive or negative opinions about things has obvious business applications. It's simplistic, sometimes too simplistic, but it's one of the easiest ways to get measurable results from NLP. In a few steps, you can put text in one end and get positive and negative scores out the other, and you never have to figure out what you should do with a parse tree or a graph of entities or any difficult representation like that. This model is not the point of that paper, so don't take this as an attack on their results; it was there as an example of a well-known way to use word vectors.


Natural Language Processing Key Terms, Explained

@machinelearnbot

Very broadly, natural language processing (NLP) is a discipline which is interested in how human languages, and, to some extent, the humans who speak them, interact with technology. If a document collection's words are ordered by frequency, and y is used to describe the number of times that the xth word appears, Zipf's observation is concisely captured as y cx-1/2 (item frequency is inversely proportional to item rank). Also known as meaning generation, semantic analysis is interested in determining the meaning of text selections (either character or word sequences). After an input selection of text is read and parsed (analyzed syntactically), the text selection can then be interpreted for meaning.


Parsing gender stereotypes in Japan's media landscape

The Japan Times

Tomomi Inada's resignation as defense minister ended a tenure that often made reporters wonder if her transgressions had more to do with ignorance than with incompetence. It would be wrong to associate her failures with her sex, though there were some in the media who harped on her fashion sense or supposed emotional instability as indications that she wasn't suitable for the job. Inada didn't actively discourage these indications. In June, she addressed the second plenary session of the International Institute of Strategic Studies' Shangri-La Dialogue in Singapore, where she expressed in English how privileged she felt to "share the podium" with other defense ministers, namely Marise Payne of Australia and Sylvie Goulard of France, saying that "We belong to the same gender … the same generation and, most importantly, we are all good looking." As mentioned in a June 14 article in the Huffington Post, Mayumi Mori, the Asahi Shimbun Singapore correspondent, noted that Inada was obviously making a joke "to relieve tension," and that there were a few chuckles in the hall.


Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning

arXiv.org Machine Learning

Especially in the community of Digital Humanities, the automated processing of Latin texts has always been a popular research topic. In a variety of computational applications, such as text reuse detection [Franzini et al, 2015], it is desirable to annotate and augment Latin texts with useful morpho-syntactical or lexical information, such as lemmas. In this paper, we will focus on two sequence tagging tasks for medieval Latin: part-of-speech tagging and lemmatization. Given a piece of Latin text, the task of lemmatization involves assigning each word to a single dictionary headword or'lemma': a baseform label (preferably in a normalized orthography) grouping all word tokens which only differ in spelling and/or inflection [Knowles et al, 2004]. The task of lemmatization is closely related to that of part-of-speech (PoS) tagging [Jurafsky et al, 2000], in which each word in a running text should be assigned a tag indicating its part of speech or word class (e.g.


Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints

arXiv.org Machine Learning

Language is increasingly being used to define rich visual recognition problems with supporting image collections sourced from the web. Structured prediction models are used in these tasks to take advantage of correlations between co-occurring labels and visual input but risk inadvertently encoding social biases found in web corpora. In this work, we study data and models associated with multilabel object classification and visual semantic role labeling. We find that (a) datasets for these tasks contain significant gender bias and (b) models trained on these datasets further amplify existing bias. For example, the activity cooking is over 33% more likely to involve females than males in a training set, and a trained model further amplifies the disparity to 68% at test time. We propose to inject corpus-level constraints for calibrating existing structured prediction models and design an algorithm based on Lagrangian relaxation for collective inference. Our method results in almost no performance loss for the underlying recognition task but decreases the magnitude of bias amplification by 47.5% and 40.5% for multilabel classification and visual semantic role labeling, respectively.


A Mention-Ranking Model for Abstract Anaphora Resolution

arXiv.org Machine Learning

Resolving abstract anaphora is an important, but difficult task for text understanding. Yet, with recent advances in representation learning this task becomes a more tangible aim. A central property of abstract anaphora is that it establishes a relation between the anaphor embedded in the anaphoric sentence and its (typically non-nominal) antecedent. We propose a mention-ranking model that learns how abstract anaphors relate to their antecedents with an LSTM-Siamese Net. We overcome the lack of training data by generating artificial anaphoric sentence--antecedent pairs. Our model outperforms state-of-the-art results on shell noun resolution. We also report first benchmark results on an abstract anaphora subset of the ARRAU corpus. This corpus presents a greater challenge due to a mixture of nominal and pronominal anaphors and a greater range of confounders. We found model variants that outperform the baselines for nominal anaphors, without training on individual anaphor data, but still lag behind for pronominal anaphors. Our model selects syntactically plausible candidates and -- if disregarding syntax -- discriminates candidates using deeper features.