Grammars & Parsing
Generating Animations from Screenplays
Zhang, Yeyao, Tsipidi, Eleftheria, Schriber, Sasha, Kapadia, Mubbasir, Gross, Markus, Modi, Ashutosh
Automatically generating animation from natural language text finds application in a number of areas e.g. movie script writing, instructional videos, and public safety. However, translating natural language text into animation is a challenging task. Existing text-to-animation systems can handle only very simple sentences, which limits their applications. In this paper, we develop a text-to-animation system which is capable of handling complex sentences. We achieve this by introducing a text simplification step into the process. Building on an existing animation generation system for screenwriting, we create a robust NLP pipeline to extract information from screenplays and map them to the system's knowledge base. We develop a set of linguistic transformation rules that simplify complex sentences. Information extracted from the simplified sentences is used to generate a rough storyboard and video depicting the text. Our sentence simplification module outperforms existing systems in terms of BLEU and SARI metrics.We further evaluated our system via a user study: 68 % participants believe that our system generates reasonable animation from input screenplays.
A Grounded Unsupervised Universal Part-of-Speech Tagger for Low-Resource Languages
Cardenas, Ronald, Lin, Ying, Ji, Heng, May, Jonathan
Unsupervised part of speech (POS) tagging is often framed as a clustering problem, but practical taggers need to ground their clusters as well. Grounding generally requires reference labeled data, a luxury a low-resource language might not have. In this work, we describe an approach for low-resource unsupervised POS tagging that yields fully grounded output and requires no labeled training data. We find the classic method of Brown et al. (1992) clusters well in our use case and employ a decipherment-based approach to grounding. This approach presumes a sequence of cluster IDs is a'ciphertext' and seeks a POS tag-tocluster ID mapping that will reveal the POS sequence. We show intrinsically that, despite the difficulty of the task, we obtain reasonable performance across a variety of languages. We also show extrinsically that incorporating our POS tagger into a name tagger leads to stateof-the-art tagging performance in Sinhalese and Kinyarwanda, two languages with nearly no labeled POS data available. We further demonstrate our tagger's utility by incorporating Figure 1: Overview of our approach to grounded POS it into a true'zero-resource' variant of the tagging. We use an unsupervised clustering method MALOPA(Ammar et al., 2016) dependency (Section 3.2) then reduce and ground the clusters using parser model that removes the current reliance a decipherment approach informed by POS tag sequence on multilingual resources and gold POS tags data from many languages (Section 3.3).
A Unified Model for Joint Chinese Word Segmentation and Dependency Parsing
Yan, Hang, Qiu, Xipeng, Huang, Xuanjing
Chinese word segmentation and dependency parsing are two fundamental tasks for Chinese natural language processing. The dependency parsing is defined on word-level, therefore word segmentation is the precondition of dependency parsing, which makes dependency parsing suffers from error propagation. In this paper, we propose a unified model to integrate Chinese word segmentation and dependency parsing. Different from previous joint models, our proposed model is a graph-based model and more concise, which results in fewer efforts of feature engineering. Our joint model achieves better performance than previous joint models. Our joint model achieves the state-of-the-art results in both Chinese word segmentation and dependency parsing.
Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS
Guo, Haohan, Soong, Frank K., He, Lei, Xie, Lei
The end-to-end TTS, which can predict speech directly from a given sequence of graphemes or phonemes, has shown improved performance over the conventional TTS. However, its predicting capability is still limited by the acoustic/phonetic coverage of the training data, usually constrained by the training set size. To further improve the TTS quality in pronunciation, prosody and perceived naturalness, we propose to exploit the information embedded in a syntactically parsed tree where the inter-phrase/word information of a sentence is organized in a multilevel tree structure. Specifically, two key features: phrase structure and relations between adjacent words are investigated. Experimental results in subjective listening, measured on three test sets, show that the proposed approach is effective to improve the pronunciation clarity, prosody and naturalness of the synthesized speech of the baseline system.
Joint Learning of Pre-Trained and Random Units for Domain Adaptation in Part-of-Speech Tagging
Meftah, Sara, Tamaazousti, Youssef, Semmar, Nasredine, Essafi, Hassane, Sadat, Fatiha
Fine-tuning neural networks is widely used to transfer valuable knowledge from high-resource to low-resource domains. In a standard fine-tuning scheme, source and target problems are trained using the same architecture. Although capable of adapting to new domains, pre-trained units struggle with learning uncommon target-specific patterns. In this paper, we propose to augment the target-network with normalised, weighted and randomly initialised units that beget a better adaptation while maintaining the valuable source knowledge. Our experiments on POS tagging of social media texts (Tweets domain) demonstrate that our method achieves state-of-the-art performances on 3 commonly used datasets.
Domain Authoring Assistant for Intelligent Virtual Agents
Janghorbani, Sepehr, Modi, Ashutosh, Buhmann, Jakob, Kapadia, Mubbasir
Developing intelligent virtual characters has attracted a lot of attention in the recent years. The process of creating such characters often involves a team of creative authors who describe different aspects of the characters in natural language, and planning experts that translate this description into a planning domain. This can be quite challenging as the team of creative authors should diligently define every aspect of the character especially if it contains complex human-like behavior. Also a team of engineers has to manually translate the natural language description of a character's personality into the planning domain knowledge. This can be extremely time and resource demanding and can be an obstacle to author's creativity. The goal of this paper is to introduce an authoring assistant tool to automate the process of domain generation from natural language description of virtual characters, thus bridging between the creative authoring team and the planning domain experts. Moreover, the proposed tool also identifies possible missing information in the domain description and iteratively makes suggestions to the author.
A Survey of Code-switched Speech and Language Processing
Sitaram, Sunayana, Chandu, Khyathi Raghavi, Rallabandi, Sai Krishna, Black, Alan W
Code-switching, the alternation of languages within a conversation or utterance, is a common communicative phenomenon that occurs in multilingual communities across the world. This survey reviews computational approaches for code-switched Speech and Natural Language Processing. We motivate why processing code-switched text and speech is essential for building intelligent agents and systems that interact with users in multilingual communities. As code-switching data and resources are scarce, we list what is available in various code-switched language pairs with the language processing tasks they can be used for. We review code-switching research in various Speech and NLP applications, including language processing tools and end-to-end systems. We conclude with future directions and open problems in the field.
A Type-coherent, Expressive Representation as an Initial Step to Language Understanding
Kim, Gene Louis, Schubert, Lenhart
A growing interest in tasks involving language understanding by the NLP community has led to the need for effective semantic parsing and inference. Modern NLP systems use semantic representations that do not quite fulfill the nuanced needs for language understanding: adequately modeling language semantics, enabling general inferences, and being accurately recoverable. This document describes underspecified logical forms (ULF) for Episodic Logic (EL), which is an initial form for a semantic representation that balances these needs. ULFs fully resolve the semantic type structure while leaving issues such as quantifier scope, word sense, and anaphora unresolved; they provide a starting point for further resolution into EL, and enable certain structural inferences without further resolution. This document also presents preliminary results of creating a hand-annotated corpus of ULFs for the purpose of training a precise ULF parser, showing a three-person pairwise interannotator agreement of 0.88 on confident annotations. We hypothesize that a divide-and-conquer approach to semantic parsing starting with derivation of ULFs will lead to semantic analyses that do justice to subtle aspects of linguistic meaning, and will enable construction of more accurate semantic parsers.
Multi-Relational Question Answering from Narratives: Machine Reading and Reasoning in Simulated Worlds
Labutov, Igor, Yang, Bishan, Prakash, Anusha, Azaria, Amos
Question Answering (QA), as a research field, has primarily focused on either knowledge bases (KBs) or free text as a source of knowledge. These two sources have historically shaped the kinds of questions that are asked over these sources, and the methods developed to answer them. In this work, we look towards a practical use-case of QA over user-instructed knowledge that uniquely combines elements of both structured QA over knowledge bases, and unstructured QA over narrative, introducing the task of multi-relational QA over personal narrative. As a first step towards this goal, we make three key contributions: (i) we generate and release TextWorldsQA, a set of five diverse datasets, where each dataset contains dynamic narrative that describes entities and relations in a simulated world, paired with variably compositional questions over that knowledge, (ii) we perform a thorough evaluation and analysis of several state-of-the-art QA models and their variants at this task, and (iii) we release a lightweight Python-based framework we call TextWorlds for easily generating arbitrary additional worlds and narrative, with the goal of allowing the community to create and share a growing collection of diverse worlds as a test-bed for this task.
Cooperative Learning of Disjoint Syntax and Semantics
Havrylov, Serhii, Kruszewski, Germán, Joulin, Armand
There has been considerable attention devoted to models that learn to jointly infer an expression's syntactic structure and its semantics. Yet, \citet{NangiaB18} has recently shown that the current best systems fail to learn the correct parsing strategy on mathematical expressions generated from a simple context-free grammar. In this work, we present a recursive model inspired by \newcite{ChoiYL18} that reaches near perfect accuracy on this task. Our model is composed of two separated modules for syntax and semantics. They are cooperatively trained with standard continuous and discrete optimization schemes. Our model does not require any linguistic structure for supervision and its recursive nature allows for out-of-domain generalization with little loss in performance. Additionally, our approach performs competitively on several natural language tasks, such as Natural Language Inference or Sentiment Analysis.