Grammars & Parsing



Four deep learning trends from ACL 2017

#artificialintelligence

Though attention often plays the role of word alignment in NMT, Koehn and Knowles note that it learns to play other, harder-to-understand roles too; thus it is not always as understandable as we might hope. In Parameter Free Hierarchical Graph-Based Clustering for Analyzing Continuous Word Embeddings, Trost and Klakow perform clustering on word embeddings, then cluster those clusters, and so on to obtain a hierarchical tree-like structure. Neural networks are powerful because they can learn arbitrary continuous representations, but humans find discrete information – like language itself – easier to understand. These systems should ideally produce a proof or derivation of the answer – for a semantic parsing question answering system, this might be the semantic parse itself, or a relevant excerpt from the knowledge base.


Genetic Programming (Machine Learning/AI): "Santa Fe Trail" problem - Syntax Trees

#artificialintelligence

The syntax tree of the fittest individual is shown for each generation until a solution with perfect fitness is found - and beyond. Not too exciting for a small function/terminal set and a program size limit of 50 instructions but there you go! For details on the "Santa Fe Trail problem" please see https://en.wikipedia.org/wiki/Santa_F...


Four deep learning trends from ACL 2017

@machinelearnbot

Instead she concluded that Structure Is Coming Back, and provided via example one reason to embrace its return: linguistic structure reduces the search space of possible outputs, making it easier to generate well-formed output. Chris Dyer also argued for the importance of incorporating linguistic structure into deep learning in his CoNLL keynote Should Neural Network Architecture Reflect Linguistic Structure? Like Noah Smith, he drew attention to the inductive biases inherent in the sequential approach, arguing that RNNs have an inductive bias towards sequential recency, while syntax-guided hierarchical architectures (such as recursive NNs and RNNGs) have an inductive bias towards syntactic recency. While the "language is just sequences" paradigm argues that RNNs can compute anything, researchers are increasingly interested in how the inductive biases of the sequential model affect what they do compute.


How to make a racist AI without really trying

#artificialintelligence

Recognizing whether people are expressing positive or negative opinions about things has obvious business applications. It's simplistic, sometimes too simplistic, but it's one of the easiest ways to get measurable results from NLP. In a few steps, you can put text in one end and get positive and negative scores out the other, and you never have to figure out what you should do with a parse tree or a graph of entities or any difficult representation like that. This model is not the point of that paper, so don't take this as an attack on their results; it was there as an example of a well-known way to use word vectors.


How to make a racist AI without really trying

#artificialintelligence

Recognizing whether people are expressing positive or negative opinions about things has obvious business applications. It's simplistic, sometimes too simplistic, but it's one of the easiest ways to get measurable results from NLP. In a few steps, you can put text in one end and get positive and negative scores out the other, and you never have to figure out what you should do with a parse tree or a graph of entities or any difficult representation like that. This model is not the point of that paper, so don't take this as an attack on their results; it was there as an example of a well-known way to use word vectors.


Finding the right representation for your NLP data - Tryolabs Blog

@machinelearnbot

When considering what information is important for a certain decision procedure (say, a classification task), there's an interesting gap between what's theoretically --that is, actually-- important on the one hand and what gives good results in practice as input to machine learning (ML) algorithms, on the other. On the other hand, embedding syntactic structures in a vector space while making the distance relation meaningful is not quite as easy. Funnily enough, when I tried the two Dancing Monkeys in a Tuxedo sentences with Stanford's recursive sentiment analysis tool, it classified both sentences as negative. What you can do in this case is to restructure your input vector so that instead of having a unique, separate feature for the sentiment of the review, you use feature combinations (also called feature crosses) so that all word frequency features include information about the sentiment.


Natural Language Processing Key Terms, Explained

@machinelearnbot

Very broadly, natural language processing (NLP) is a discipline which is interested in how human languages, and, to some extent, the humans who speak them, interact with technology. If a document collection's words are ordered by frequency, and y is used to describe the number of times that the xth word appears, Zipf's observation is concisely captured as y cx-1/2 (item frequency is inversely proportional to item rank). Also known as meaning generation, semantic analysis is interested in determining the meaning of text selections (either character or word sequences). After an input selection of text is read and parsed (analyzed syntactically), the text selection can then be interpreted for meaning.


Parsing gender stereotypes in Japan's media landscape

The Japan Times

As mentioned in a June 14 article in the Huffington Post, Mayumi Mori, the Asahi Shimbun Singapore correspondent, noted that Inada was obviously making a joke "to relieve tension," and that there were a few chuckles in the hall. The author of the Huffington article, editor-in-chief Ryan Takeshita, wrote that Inada has always played "cute" to be accepted by the men who control Japan's political world, but even if her remark about the female ministers' looks was made in jest, it reinforced the idea held by many people that appearance is paramount, especially for women. Japan doesn't have a monopoly on sexist behavior and attitudes, but according to a recent series of forums in the Asahi Shimbun the Japanese media still subscribes to gender stereotypes in advertising and reporting. At least the Unicharm spot stimulated a debate about how the media portrays gender roles.


The Stanford Natural Language Processing Group

@machinelearnbot

As such, there has been a surge of academic and commercial interest in predicting values for gender, age, race, location, interests, personality, and more, given some portion of the information available in data about individuals, including social profiles, customer records, and more. Such programs are informed by research in natural language processing, computer vision, psychology and related fields, and they can be used for positive, negative, and mixed ends. His main research interests include categorial grammars, parsing, semi-supervised learning for NLP, reference resolution and text geolocation. He has long been active in the creation and promotion of open source software for natural language processing: he is one of the co-creators of the Apache OpenNLP Toolkit and OpenCCG, and he has contributed to many others, including ScalaNLP, Junto, and TextGrounder.