Collaborating Authors

Grammars & Parsing

Modelling Morphological Features


This is an excerpt from my master thesis titled: "Semi-supervised morphological reinflection using rectified random variables" Languages use suffixes and prefixes to convey context, stress, intonation, and grammatical meaning (like subject-verb agreement). Such suffixes and prefixes form a more general class of entities which are the meaningful sub-parts of a word; these are called as morphemes. A language's morphology refers to the rules and processes through which morphemes are combined; this allows a word to express its syntactic categories and semantic meaning. For example, in English, a verb can have three tenses: past, present, and future. These are the inflected forms' of the verb.

Natural Language Processing with spaCy-- Steps and Examples


Part-of-speech tagging is used to assign parts of speech to each word of a given text (such as nouns, verbs, pronouns, adverbs, conjunction, adjectives, interjection) based on its definition and its context. Parts of Speech tagging can be done in spaCy using a token attribute class. Please check for more details here. The above output shows the parts of speech for all the words with complete descriptive details using spacy.explain.

Roadmap to Natural Language Processing (NLP) - KDnuggets


Natural Language Processing (NLP) is the area of research in Artificial Intelligence focused on processing and using Text and Speech data to create smart machines and create insights. One of nowadays most interesting NLP application is creating machines able to discuss with humans about complex topics. IBM Project Debater represents so far one of the most successful approaches in this area. All of these preprocessing techniques can be easily applied to different types of texts using standard Python NLP libraries such as NLTK and Spacy. Additionally, in order to extrapolate the language syntax and structure of our text, we can make use of techniques such as Parts of Speech (POS) Tagging and Shallow Parsing (Figure 1).

Machine Learning in Static Code Analysis


Machine learning has firmly entrenched in a variety of human fields, from speech recognition to medical diagnosing. The popularity of this approach is so great that people try to use it wherever they can. Some attempts to replace classical approaches with neural networks turn up unsuccessful. This time we'll consider machine learning in terms of creating effective static code analyzers for finding bugs and potential vulnerabilities. The PVS-Studio team is often asked if we want to start using machine learning to find bugs in the software source code. The short answer is yes, but to a limited extent. We believe that with machine learning, there are many pitfalls lurking in code analysis tasks. In the second part of the article, we will tell about them. Let's start with a review of new solutions and ideas. Nowadays there are many static analyzers based on or using machine learning, including deep learning and NLP for error detection. Not only did enthusiasts double down on machine learning potential, but also large companies, for example, Facebook, Amazon, or Mozilla. Some projects aren't full-fledged static analyzers, as they only find some certain errors in commits. Interestingly, almost all of them are positioned as game changer products that will make a breakthrough in the development process due to artificial intelligence. Let's look at some of the well-known examples: Deep Code is a vulnerability-searching tool for Java, JavaScript, TypeScript, and Python software code that features machine learning as a component. According to Boris Paskalev, more than 250,000 rules are already in place. This tool learns from changes, made by developers in the source code of open source projects (a million of repositories). The company itself says that their project is some kind of Grammarly for developers. In fact, this analyzer compares your solution with its project base and offers you the intended best solution from the experience of other developers. In May 2018, developers said that the support of C is on its way, but so far, this language is not supported. Although, as stated on the site, the new language support can be added in a matter of weeks due to the fact that the language depends only on one stage, which is parsing. A series of posts about basic methods of the analyzer is also available on the site. Facebook is quite zealous in its attempts to introduce new comprehensive approaches in its products.

NLP: My Solution to Kaggle's Disaster Tweet Competition


Natural language processing, or NLP, is a subfield of linguistics, computer science and artificial intelligence concerned with the interactions between computers and human language, with particular emphasis on how to program computers to process and analyse large amounts of natural language data. One crucial point about NLP is the fact that the text is bundled up in objects that must be converted to numeric representation before the CPU can do anything with it. In fact, the decimal numbers we see on the computer screen must also be converted to a base two number system comprised of ones and zeros before the computer can use it because the device is designed to utilise binary logic at the heart of its processing unit. NLP has many applications that help us in our modern lives. Sentiment analysis is able to recognise subtle nuances in emotion and opinion, and determine whether they are positive or negative.

NLP - Natural Language Processing with Python


NLP - Natural Language Processing with Python Learn to use Machine Learning, Spacy, NLTK, SciKit-Learn, Deep Learning, and more to conduct Natural Language Processing Bestseller What you'll learn Welcome to the best Natural Language Processing course on the internet! This course is designed to be your complete online resource for learning how to use Natural Language Processing with the Python programming language. In the course we will cover everything you need to learn in order to become a world class practitioner of NLP with Python. We'll start off with the basics, learning how to open and work with text and PDF files with Python, as well as learning how to use regular expressions to search for custom patterns inside of text files. Afterwards we will begin with the basics of Natural Language Processing, utilizing the Natural Language Toolkit library for Python, as well as the state of the art Spacy library for ultra fast tokenization, parsing, entity recognition, and lemmatization of text.

SQuARE: Semantics-based Question Answering and Reasoning Engine Artificial Intelligence

Understanding the meaning of a text is a fundamental challenge of natural language understanding (NLU) and from its early days, it has received significant attention through question answering (QA) tasks. We introduce a general semantics-based framework for natural language QA and also describe the SQuARE system, an application of this framework. The framework is based on the denotational semantics approach widely used in programming language research. In our framework, valuation function maps syntax tree of the text to its commonsense meaning represented using basic knowledge primitives (the semantic algebra) coded using answer set programming (ASP). We illustrate an application of this framework by using VerbNet primitives as our semantic algebra and a novel algorithm based on partial tree matching that generates an answer set program that represents the knowledge in the text. A question posed against that text is converted into an ASP query using the same framework and executed using the s(CASP) goal-directed ASP system. Our approach is based purely on (commonsense) reasoning. SQuARE achieves 100% accuracy on all the five datasets of bAbI QA tasks that we have tested. The significance of our work is that, unlike other machine learning based approaches, ours is based on "understanding" the text and does not require any training. SQuARE can also generate an explanation for an answer while maintaining high accuracy.

Faster Smarter Induction in Isabelle/HOL with SeLFiE Artificial Intelligence

Proof by induction is a long-standing challenge in Computer Science. Induction tactics of proof assistants facilitate proof by induction, but rely on humans to manually specify how to apply induction. In this paper, we present SeLFiE, a domain-specific language to encode experienced users' expertise on how to apply the induct tactic in Isabelle/HOL: when we apply an induction heuristic written in SeLFiE to an inductive problem and arguments to the induct tactic, the SeLFiE interpreter examines both the syntactic structure of the problem and semantics of the relevant constants to judge whether the arguments to the induct tactic are plausible according to the heuristic. Then, we present semantic_induct, an automatic tool to recommend how to apply the induct tactic. Given an inductive problem, semantic_induct produces candidate arguments to the induct tactic and selects promising ones using heuristics written in SeLFiE. Our evaluation based on 254 inductive problems from nine problem domains show that semantic_induct achieved 15.7 percentage points of improvements in coincidence rates for the three most promising recommendations while achieving 43% of reduction in the median value for the execution time when compared to an existing tool, smart_induct.

Grounded Adaptation for Zero-shot Executable Semantic Parsing Artificial Intelligence

We propose Grounded Adaptation for Zero-shot Executable Semantic Parsing (GAZP) to adapt an existing semantic parser to new environments (e.g. new database schemas). GAZP combines a forward semantic parser with a backward utterance generator to synthesize data (e.g. utterances and SQL queries) in the new environment, then selects cycle-consistent examples to adapt the parser. Unlike data-augmentation, which typically synthesizes unverified examples in the training environment, GAZP synthesizes examples in the new environment whose input-output consistency are verified. On the Spider, Sparc, and CoSQL zero-shot semantic parsing tasks, GAZP improves logical form and execution accuracy of the baseline parser. Our analyses show that GAZP outperforms data-augmentation in the training environment, performance increases with the amount of GAZP-synthesized data, and cycle-consistency is central to successful adaptation.

Leveraging Semantic Parsing for Relation Linking over Knowledge Bases Artificial Intelligence

Knowledgebase question answering systems are heavily dependent on relation extraction and linking modules. However, the task of extracting and linking relations from text to knowledgebases faces two primary challenges; the ambiguity of natural language and lack of training data. To overcome these challenges, we present SLING, a relation linking framework which leverages semantic parsing using Abstract Meaning Representation (AMR) and distant supervision. SLING integrates multiple relation linking approaches that capture complementary signals such as linguistic cues, rich semantic representation, and information from the knowledgebase. The experiments on relation linking using three KBQA datasets; QALD-7, QALD-9, and LC-QuAD 1.0 demonstrate that the proposed approach achieves state-of-the-art performance on all benchmarks.