Goto

Collaborating Authors

 Grammars & Parsing


Fine-grained Few-shot Recognition by Deep Object Parsing

arXiv.org Artificial Intelligence

We propose a new method for fine-grained few-shot recognition via deep object parsing. In our framework, an object is made up of K distinct parts and for each part, we learn a dictionary of templates, which is shared across all instances and categories. An object is parsed by estimating the locations of these K parts and a set of active templates that can reconstruct the part features. We recognize test instances by comparing its active templates and the relative geometry of its part locations against those of the presented few-shot instances. Our method is end-to-end trainable to learn part templates on-top of a convolutional backbone. To combat visual distortions such as orientation, pose and size, we learn templates at multiple scales, and at test-time parse and match instances across these scales. We show that our method is competitive with the state-of-the-art, and by virtue of parsing enjoys interpretability as well.


On the Explainability of Natural Language Processing Deep Models

arXiv.org Artificial Intelligence

While there has been a recent explosion of work on ExplainableAI ExAI on deep models that operate on imagery and tabular data, textual datasets present new challenges to the ExAI community. Such challenges can be attributed to the lack of input structure in textual data, the use of word embeddings that add to the opacity of the models and the difficulty of the visualization of the inner workings of deep models when they are trained on textual data. Lately, methods have been developed to address the aforementioned challenges and present satisfactory explanations on Natural Language Processing (NLP) models. However, such methods are yet to be studied in a comprehensive framework where common challenges are properly stated and rigorous evaluation practices and metrics are proposed. Motivated to democratize ExAI methods in the NLP field, we present in this work a survey that studies model-agnostic as well as model-specific explainability methods on NLP models. Such methods can either develop inherently interpretable NLP models or operate on pre-trained models in a post-hoc manner. We make this distinction and we further decompose the methods into three categories according to what they explain: (1) word embeddings (input-level), (2) inner workings of NLP models (processing-level) and (3) models' decisions (output-level). We also detail the different evaluation approaches interpretability methods in the NLP field. Finally, we present a case-study on the well-known neural machine translation in an appendix and we propose promising future research directions for ExAI in the NLP field.


Better Smatch = Better Parser? AMR evaluation is not so simple anymore

arXiv.org Artificial Intelligence

Recently, astonishing advances have been observed in AMR parsing, as measured by the structural Smatch metric. In fact, today's systems achieve performance levels that seem to surpass estimates of human inter annotator agreement (IAA). Therefore, it is unclear how well Smatch (still) relates to human estimates of parse quality, as in this situation potentially fine-grained errors of similar weight may impact the AMR's meaning to different degrees. We conduct an analysis of two popular and strong AMR parsers that -- according to Smatch -- reach quality levels on par with human IAA, and assess how human quality ratings relate to Smatch and other AMR metrics. Our main findings are: i) While high Smatch scores indicate otherwise, we find that AMR parsing is far from being solved: we frequently find structurally small, but semantically unacceptable errors that substantially distort sentence meaning. ii) Considering high-performance parsers, better Smatch scores may not necessarily indicate consistently better parsing quality. To obtain a meaningful and comprehensive assessment of quality differences of parse(r)s, we recommend augmenting evaluations with macro statistics, use of additional metrics, and more human analysis.


Annotating Norwegian Language Varieties on Twitter for Part-of-Speech

arXiv.org Artificial Intelligence

Norwegian Twitter data poses an interesting challenge for Natural Language Processing (NLP) tasks. These texts are difficult for models trained on standardized text in one of the two Norwegian written forms (Bokm{\aa}l and Nynorsk), as they contain both the typical variation of social media text, as well as a large amount of dialectal variety. In this paper we present a novel Norwegian Twitter dataset annotated with POS-tags. We show that models trained on Universal Dependency (UD) data perform worse when evaluated against this dataset, and that models trained on Bokm{\aa}l generally perform better than those trained on Nynorsk. We also see that performance on dialectal tweets is comparable to the written standards for some models. Finally we perform a detailed analysis of the errors that models commonly make on this data.


PriMeSRL-Eval: A Practical Quality Metric for Semantic Role Labeling Systems Evaluation

arXiv.org Artificial Intelligence

Semantic role labeling (SRL) identifies the predicate-argument structure in a sentence. This task is usually accomplished in four steps: predicate identification, predicate sense disambiguation, argument identification, and argument classification. Errors introduced at one step propagate to later steps. Unfortunately, the existing SRL evaluation scripts do not consider the full effect of this error propagation aspect. They either evaluate arguments independent of predicate sense (CoNLL09) or do not evaluate predicate sense at all (CoNLL05), yielding an inaccurate SRL model performance on the argument classification task. In this paper, we address key practical issues with existing evaluation scripts and propose a more strict SRL evaluation metric PriMeSRL. We observe that by employing PriMeSRL, the quality evaluation of all SoTA SRL models drops significantly, and their relative rankings also change. We also show that PriMeSRLsuccessfully penalizes actual failures in SoTA SRL models.


Applying FrameNet to Chinese(Poetry)

arXiv.org Artificial Intelligence

FrameNet( Fillmore and Baker [2009] ) is well-known for its wide use for knowledge representation in the form of inheritance-based ontologies and lexica( Trott et al. [2020] ). Although FrameNet is usually applied to languages like English, Spanish and Italian, there are still plenty of FrameNet data sets available for other languages like Chinese, which differs significantly from those languages based on Latin alphabets. In this paper, the translation from ancient Chinese Poetry to modern Chinese will be first conducted to further apply the Chinese FrameNet(CFN, provided by Shanxi University). Afterwards, the translation from modern Chinese will be conducted as well for the comparison between the applications of CFN and English FrameNet. Finally, the overall comparison will be draw between CFN to modern Chinese and English FrameNet.


Chinese Discourse Annotation Reference Manual

arXiv.org Artificial Intelligence

This document provides extensive guidelines and examples for Rhetorical Structure Theory (RST) annotation in Mandarin Chinese. The guideline is divided into three sections. We first introduce preprocessing steps to prepare data for RST annotation. Secondly, we discuss syntactic criteria to segment texts into Elementary Discourse Units (EDUs). Lastly, we provide examples to define and distinguish discourse relations in different genres. We hope that this reference manual can facilitate RST annotations in Chinese and accelerate the development of the RST framework across languages.


Aggregating Crowdsourced and Automatic Judgments to Scale Up a Corpus of Anaphoric Reference for Fiction and Wikipedia Texts

arXiv.org Artificial Intelligence

Although several datasets annotated for anaphoric reference/coreference exist, even the largest such datasets have limitations in terms of size, range of domains, coverage of anaphoric phenomena, and size of documents included. Yet, the approaches proposed to scale up anaphoric annotation haven't so far resulted in datasets overcoming these limitations. In this paper, we introduce a new release of a corpus for anaphoric reference labelled via a game-with-a-purpose. This new release is comparable in size to the largest existing corpora for anaphoric reference due in part to substantial activity by the players, in part thanks to the use of a new resolve-and-aggregate paradigm to 'complete' markable annotations through the combination of an anaphoric resolver and an aggregation method for anaphoric reference. The proposed method could be adopted to greatly speed up annotation time in other projects involving games-with-a-purpose. In addition, the corpus covers genres for which no comparable size datasets exist (Fiction and Wikipedia); it covers singletons and non-referring expressions; and it includes a substantial number of long documents (> 2K in length).


Event Extraction: A Survey

arXiv.org Artificial Intelligence

Extracting the reported events from text is one of the key research themes in natural language processing. This process includes several tasks such as event detection, argument extraction, role labeling. As one of the most important topics in natural language processing and natural language understanding, the applications of event extraction spans across a wide range of domains such as newswire, biomedical domain, history and humanity, and cyber security. This report presents a comprehensive survey for event detection from textual documents. In this report, we provide the task definition, the evaluation method, as well as the benchmark datasets and a taxonomy of methodologies for event extraction. We also present our vision of future research direction in event detection.


Self-move and Other-move: Quantum Categorical Foundations of Japanese

arXiv.org Artificial Intelligence

The purpose of this work is to contribute toward the larger goal of creating a Quantum Natural Language Processing (QNLP) translator program. This work contributes original diagrammatic representations of the Japanese language based on prior work that accomplished on the English language based on category theory. The germane differences between the English and Japanese languages are emphasized to help address English language bias in the current body of research. Additionally, topological principles of these diagrams and many potential avenues for further research are proposed. Why is this endeavor important? Hundreds of languages have developed over the course of millennia coinciding with the evolution of human interaction across time and geographic location. These languages are foundational to human survival, experience, flourishing, and living the good life. They are also, however, the strongest barrier between people groups. Over the last several decades, advancements in Natural Language Processing (NLP) have made it easier to bridge the gap between individuals who do not share a common language or culture. Tools like Google Translate and DeepL make it easier than ever before to share our experiences with people globally. Nevertheless, these tools are still inadequate as they fail to convey our ideas across the language barrier fluently, leaving people feeling anxious and embarrassed. This is particularly true of languages born out of substantially different cultures, such as English and Japanese. Quantum computers offer the best chance to achieve translation fluency in that they are better suited to simulating the natural world and natural phenomenon such as natural speech. Keywords: category theory, DisCoCat, DisCoCirc, Japanese grammar, English grammar, translation, topology, Quantum Natural Language Processing, Natural Language Processing