Grammars & Parsing


Packet Parsing Accolade Technology - Intelligent Host CPU Offload 1-100GE

#artificialintelligence

Each ANIC adapter has a very powerful and flexible L2/L3/L4 packet parser. The header information from each packet that enters the system is extracted and processed to inform the host application about relevant packet details and also as input for packet filtering. The parser is able to recognize various tunneling and encapsulation protocols such as VLAN, VXLAN, MPLS, GTP and GRE. The adapter is then able to intelligently strip away the tunnel encapsulations and analyze the relevant packet information contained inside the tunnel.


AI-powered grammar tools from Google and others make sentence-parsing a thing of the past. Parents and teachers wonder if kids will suffer. - The Washington Post

#artificialintelligence

While some education experts applaud the advancement of high-tech grammar tools as a way to help people more clearly express their thoughts, others aren't so sure. Artificial intelligence, according to the contrarians, is only as smart as the humans who program it, and often just as biased. "Language is part of your heritage and identity, and if you're using a tool that is constantly telling you, 'You're wrong,' that is not a good thing," said Paulo Blikstein, associate professor of communications, media and learning technology design at Columbia University Teachers College. "There is not one mythical, monolithical (English) … And every time we have tried to curtail the evolution of a language, it has never gone well." In the era of spellcheck and auto-correct, does it matter that my son can't spell?


Part of speech - Word Tagger

#artificialintelligence

The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging or POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The part of speech explains how a word is used in a sentence. There are eight main parts of speech -- nouns, pronouns, adjectives, verbs, adverbs, prepositions, conjunctions and interjections. The collection of tags used for a particular task is known as a Tagset.


Language, trees, and geometry in neural networks

#artificialintelligence

Left image in each pair, a traditional parse tree view, but the vertical length of each branch represents embedding distance. Right images: PCA projection of context embeddings, where color shows deviation from expected distance.


Fine-grained Information Status Classification Using Discourse Context-Aware Self-Attention

arXiv.org Artificial Intelligence

Previous work on bridging anaphora recognition (Hou et al., 2013a) casts the problem as a subtask of learning fine-grained information status (IS). However, these systems heavily depend on many handcrafted linguistic features. In this paper, we propose a discourse context-aware self-attention neural network model for fine-grained IS classification. On the ISNotes corpus (Markert et al., 2012), our model with the contextually-encoded word representations (BERT) (Devlin et al., 2018) achieves new state-of-the-art performances on fine-grained IS classification, obtaining a 4.1% absolute overall accuracy improvement compared to Hou et al. (2013a). More importantly, we also show an improvement of 3.9% F1 for bridging anaphora recognition without using any complex handcrafted semantic features designed for capturing the bridging phenomenon. 1 Introduction Information Structure (Halliday, 1967; Prince, 1981, 1992; Gundel et al., 1993; Lambrecht, 1994; Birner and Ward, 1998; Kruijff-Korbayov a and Steedman, 2003) studies structural and semantic properties of a sentence according to its relation to the discourse context.


A Generate-Validate Approach to Answering Questions about Qualitative Relationships

arXiv.org Artificial Intelligence

Qualitative relationships describe how increasing or decreasing one property (e.g. altitude) affects another (e.g. temperature). They are an important aspect of natural language question answering and are crucial for building chatbots or voice agents where one may enquire about qualitative relationships. Recently a dataset about question answering involving qualitative relationships has been proposed, and a few approaches to answer such questions have been explored, in the heart of which lies a semantic parser that converts the natural language input to a suitable logical form. A problem with existing semantic parsers is that they try to directly convert the input sentences to a logical form. Since the output language varies with each application, it forces the semantic parser to learn almost everything from scratch. In this paper, we show that instead of using a semantic parser to produce the logical form, if we apply the generate-validate framework i.e. generate a natural language description of the logical form and validate if the natural language description is followed from the input text, we get a better scope for transfer learning and our method outperforms the state-of-the-art by a large margin of 7.93%.


Lexical semantics - Wikipedia

#artificialintelligence

Lexical semantics (also known as lexicosemantics), is a subfield of linguistic semantics. The units of analysis in lexical semantics are lexical units which include not only words but also sub-words or sub-units such as affixes and even compound words and phrases. Lexical units make up the catalogue of words in a language, the lexicon. Lexical semantics looks at how the meaning of the lexical units correlates with the structure of the language or syntax. This is referred to as syntax-semantic interface.[1] Lexical units, also referred to as syntactic atoms, can stand alone such as in the case of root words or parts of compound words or they necessarily attach to other units such as prefixes and suffixes do. The former are called free morphemes and the latter bound morphemes.[2]


Self-Organizing Maps with Variable Input Length for Motif Discovery and Word Segmentation

arXiv.org Machine Learning

Time Series Motif Discovery (TSMD) is defined as searching for patterns that are previously unknown and appear with a given frequency in time series. Another problem strongly related with TSMD is Word Segmentation. This problem has received much attention from the community that studies early language acquisition in babies and toddlers. The development of biologically plausible models for word segmentation could greatly advance this field. Therefore, in this article, we propose the Variable Input Length Map (VILMAP) for Motif Discovery and Word Segmentation. The model is based on the Self-Organizing Maps and can identify Motifs with different lengths in time series. In our experiments, we show that VILMAP presents good results in finding Motifs in a standard Motif discovery dataset and can avoid catastrophic forgetting when trained with datasets with increasing values of input size. We also show that VILMAP achieves results similar or superior to other methods in the literature developed for the task of word segmentation.


Semantic Role Labeling with Associated Memory Network

arXiv.org Artificial Intelligence

Semantic role labeling (SRL) is a task to recognize all the predicate-argument pairs of a sentence, which has been in a performance improvement bottleneck after a series of latest works were presented. This paper proposes a novel syntax-agnostic SRL model enhanced by the proposed associated memory network (AMN), which makes use of inter-sentence attention of label-known associated sentences as a kind of memory to further enhance dependency-based SRL. In detail, we use sentences and their labels from train dataset as an associated memory cue to help label the target sentence. Furthermore, we compare several associated sentences selecting strategies and label merging methods in AMN to find and utilize the label of associated sentences while attending them. By leveraging the attentive memory from known training data, Our full model reaches state-of-the-art on CoNLL-2009 benchmark datasets for syntax-agnostic setting, showing a new effective research line of SRL enhancement other than exploiting external resources such as well pre-trained language models. 1 Introduction Semantic role labeling (SRL) is a task to recognize all the predicate-argument pairs of a given sentence and its predicates. It is a shallow semantic parsing task, which has been widely used in a series of natural language processing (NLP) tasks, such as information extraction (Liu et al., 2016) and question answering (Abujabal et al., 2017). Generally, SRL is decomposed into four classification subtasks in pipeline systems, consisting of Corresponding author.


Aroma: Using ML for code recommendation

#artificialintelligence

Thousands of engineers write the code to create our apps, which serve billions of people worldwide. This is no trivial task--our services have grown so diverse and complex that the codebase contains millions of lines of code that intersect with a wide variety of different systems, from messaging to image rendering. To simplify and speed the process of writing code that will make an impact on so many systems, engineers often want a way to find how someone else has handled a similar task. We created Aroma, a code-to-code search and recommendation tool that uses machine learning (ML) to make the process of gaining insights from big codebases much easier. Prior to Aroma, none of the existing tools fully addressed this problem.