Grammars & Parsing
Top 50 NLP Interview Questions and Answers in 2023
Natural Language Processing helps machines understand and analyze natural languages. NLP is an automated process that helps extract the required information from data by applying machine learning algorithms. Learning NLP will help you land a high-paying job as it is used by various professionals such as data scientist professionals, machine learning engineers, etc. We have compiled a comprehensive list of NLP Interview Questions and Answers that will help you prepare for your upcoming interviews. You can also check out these free NLP courses to help with your preparation. Once you have prepared the following commonly asked questions, you can get into the job role you are looking for. Without further ado, let's kickstart your NLP learning journey.
Analysis of Male and Female Speakers' Word Choices in Public Speeches
Hossain, Md Zobaer, Samin, Ahnaf Mozib
The extent to which men and women use language differently has been questioned previously. Finding clear and consistent gender differences in language is not conclusive in general, and the research is heavily influenced by the context and method employed to identify the difference. In addition, the majority of the research was conducted in written form, and the sample was collected in writing. Therefore, we compared the word choices of male and female presenters in public addresses such as TED lectures. The frequency of numerous types of words, such as parts of speech (POS), linguistic, psychological, and cognitive terms were analyzed statistically to determine how male and female speakers use words differently. Based on our data, we determined that male speakers use specific types of linguistic, psychological, cognitive, and social words in considerably greater frequency than female speakers.
Measuring Alignment Bias in Neural Seq2Seq Semantic Parsers
Locatelli, Davide, Quattoni, Ariadna
Prior to deep learning the semantic parsing community has been interested in understanding and modeling the range of possible word alignments between natural language sentences and their corresponding meaning representations. Sequence-to-sequence models changed the research landscape suggesting that we no longer need to worry about alignments since they can be learned automatically by means of an attention mechanism. More recently, researchers have started to question such premise. In this work we investigate whether seq2seq models can handle both simple and complex alignments. To answer this question we augment the popular Geo semantic parsing dataset with alignment annotations and create Geo-Aligned. We then study the performance of standard seq2seq models on the examples that can be aligned monotonically versus examples that require more complex alignments. Our empirical study shows that performance is significantly better over monotonic alignments.
LERT: A Linguistically-motivated Pre-trained Language Model
Cui, Yiming, Che, Wanxiang, Wang, Shijin, Liu, Ting
Pre-trained Language Model (PLM) has become a representative foundation model in the natural language processing field. Most PLMs are trained with linguistic-agnostic pre-training tasks on the surface form of the text, such as the masked language model (MLM). To further empower the PLMs with richer linguistic features, in this paper, we aim to propose a simple but effective way to learn linguistic features for pre-trained language models. We propose LERT, a pre-trained language model that is trained on three types of linguistic features along with the original MLM pre-training task, using a linguistically-informed pre-training (LIP) strategy. We carried out extensive experiments on ten Chinese NLU tasks, and the experimental results show that LERT could bring significant improvements over various comparable baselines. Furthermore, we also conduct analytical experiments in various linguistic aspects, and the results prove that the design of LERT is valid and effective. Resources are available at https://github.com/ymcui/LERT
Uni-Parser: Unified Semantic Parser for Question Answering on Knowledge Base and Database
Liu, Ye, Yavuz, Semih, Meng, Rui, Radev, Dragomir, Xiong, Caiming, Zhou, Yingbo
Parsing natural language questions into executable logical forms is a useful and interpretable way to perform question answering on structured data such as knowledge bases (KB) or databases (DB). However, existing approaches on semantic parsing cannot adapt to both modalities, as they suffer from the exponential growth of the logical form candidates and can hardly generalize to unseen data. In this work, we propose Uni-Parser, a unified semantic parser for question answering (QA) on both KB and DB. We introduce the primitive (relation and entity in KB, and table name, column name and cell value in DB) as an essential element in our framework. The number of primitives grows linearly with the number of retrieved relations in KB and DB, preventing us from dealing with exponential logic form candidates. We leverage the generator to predict final logical forms by altering and composing topranked primitives with different operations (e.g. select, where, count). With sufficiently pruned search space by a contrastive primitive ranker, the generator is empowered to capture the composition of primitives enhancing its generalization ability. We achieve competitive results on multiple KB and DB QA benchmarks more efficiently, especially in the compositional and zero-shot settings.
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
Syvokon, Oleksiy, Nahorna, Olena
We present a corpus professionally annotated for grammatical error correction (GEC) and fluency edits in the Ukrainian language. To the best of our knowledge, this is the first GEC corpus for the Ukrainian language. We collected texts with errors (20,715 sentences) from a diverse pool of contributors, including both native and non-native speakers. The data cover a wide variety of writing domains, from text chats and essays to formal writing. Professional proofreaders corrected and annotated the corpus for errors relating to fluency, grammar, punctuation, and spelling. This corpus can be used for developing and evaluating GEC systems in Ukrainian. More generally, it can be used for researching multilingual and low-resource NLP, morphologically rich languages, document-level GEC, and fluency correction. The corpus is publicly available at https://github.com/grammarly/ua-gec
Review of coreference resolution in English and Persian
Mohammadi, Hassan Haji, Talebpour, Alireza, Aznaveh, Ahmad Mahmoudi, Yazdani, Samaneh
Coreference resolution (CR) is one of the most challenging areas of natural language processing. This task seeks to identify all textual references to the same real-world entity. Research in this field is divided into coreference resolution and anaphora resolution. Due to its application in textual comprehension and its utility in other tasks such as information extraction systems, document summarization, and machine translation, this field has attracted considerable interest. Consequently, it has a significant effect on the quality of these systems. This article reviews the existing corpora and evaluation metrics in this field. Then, an overview of the coreference algorithms, from rule-based methods to the latest deep learning techniques, is provided. Finally, coreference resolution and pronoun resolution systems in Persian are investigated.
Perspectives on neural proof nets
Proof nets are a way of representing proofs as a type of (hyper)graph. Originally introduced for linear logic (Girard 1987), proof nets can be seen as a parallelised sequent calculus which removes inessential rule permutations, but also as a multi-conclusion natural deduction which simplifies many of the logical rules (notably the E, E, E rules). This make proof nets a good choice for automated theorem proving: avoiding needless rule permutations entails an important reduction of the search space for proofs (compared to sequent calculus, and to a somewhat lesser extent when compared to natural deduction) but still allows us to compute the lambda terms corresponding to our proofs: enumerating all different proof nets for a sequent is equivalent to enumerating all its different lambda terms. Proof nets can be adapted to different types of type-logical grammars while preserving their good logical properties (Moot 2021). This makes them an important tool for testing the predictions of different grammars written in typelogical formalisms.
Complex Knowledge Base Question Answering: A Survey
Lan, Yunshi, He, Gaole, Jiang, Jinhao, Jiang, Jing, Zhao, Wayne Xin, Wen, Ji-Rong
Knowledge base question answering (KBQA) aims to answer a question over a knowledge base (KB). Early studies mainly focused on answering simple questions over KBs and achieved great success. However, their performance on complex questions is still far from satisfactory. Therefore, in recent years, researchers propose a large number of novel methods, which looked into the challenges of answering complex questions. In this survey, we review recent advances on KBQA with the focus on solving complex questions, which usually contain multiple subjects, express compound relations, or involve numerical operations. In detail, we begin with introducing the complex KBQA task and relevant background. Then, we describe benchmark datasets for complex KBQA task and introduce the construction process of these datasets. Next, we present two mainstream categories of methods for complex KBQA, namely semantic parsing-based (SP-based) methods and information retrieval-based (IR-based) methods. Specifically, we illustrate their procedures with flow designs and discuss their major differences and similarities. After that, we summarize the challenges that these two categories of methods encounter when answering complex questions, and explicate advanced solutions and techniques used in existing work. Finally, we conclude and discuss several promising directions related to complex KBQA for future research.
Toward a Neural Semantic Parsing System for EHR Question Answering
Clinical semantic parsing (SP) is an important step toward identifying the exact information need (as a machine-understandable logical form) from a natural language query aimed at retrieving information from electronic health records (EHRs). Current approaches to clinical SP are largely based on traditional machine learning and require hand-building a lexicon. The recent advancements in neural SP show a promise for building a robust and flexible semantic parser without much human effort. Thus, in this paper, we aim to systematically assess the performance of two such neural SP models for EHR question answering (QA). We found that the performance of these advanced neural models on two clinical SP datasets is promising given their ease of application and generalizability. Our error analysis surfaces the common types of errors made by these models and has the potential to inform future research into improving the performance of neural SP models for EHR QA.