Goto

Collaborating Authors

Reliable Evaluations for Natural Language Inference based on a Unified Cross-dataset Benchmark

arXiv.org Artificial Intelligence

Recent studies show that crowd-sourced Natural Language Inference (NLI) datasets may suffer from significant biases like annotation artifacts. Models utilizing these superficial clues gain mirage advantages on the in-domain testing set, which makes the evaluation results over-estimated. The lack of trustworthy evaluation settings and benchmarks stalls the progress of NLI research. In this paper, we propose to assess a model's trustworthy generalization performance with cross-datasets evaluation. We present a new unified cross-datasets benchmark with 14 NLI datasets, and re-evaluate 9 widely-used neural network-based NLI models as well as 5 recently proposed debiasing methods for annotation artifacts. Our proposed evaluation scheme and experimental baselines could provide a basis to inspire future reliable NLI research.


RGCL at SemEval-2020 Task 6: Neural Approaches to Definition Extraction

arXiv.org Artificial Intelligence

This paper presents the RGCL team submission to SemEval 2020 Task 6: DeftEval, subtasks 1 and 2. The system classifies definitions at the sentence and token levels. It utilises state-of-the-art neural network architectures, which have some task-specific adaptations, including an automatically extended training set. Overall, the approach achieves acceptable evaluation scores, while maintaining flexibility in architecture selection.


How Good Is Your NLP Model Really?

#artificialintelligence

SageMaker Processing allows us to provision a GPU machine on demand, and only for the time needed to evaluate the model. To do so, we use a slightly modified evaluation script that can interact with the Processing job. And this time we will run the evaluation on the entire test dataset, i.e. 15K records. Once the run is complete, we can find the evaluation results in a JSON file on the specified output folder in S3 (in our case the file will be called evaluation.json): In fact, the evaluation results tell us that the Processing job managed to run 177 samples per second.


TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing

arXiv.org Artificial Intelligence

Various robustness evaluation methodologies from different perspectives have been proposed for different natural language processing (NLP) tasks. These methods have often focused on either universal or task-specific generalization capabilities. In this work, we propose a multilingual robustness evaluation platform for NLP tasks (TextFlint) that incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analysis. TextFlint enables practitioners to automatically evaluate their models from all aspects or to customize their evaluations as desired with just a few lines of code. To guarantee user acceptability, all the text transformations are linguistically based, and we provide a human evaluation for each one. TextFlint generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model's robustness. To validate TextFlint's utility, we performed large-scale empirical evaluations (over 67,000 evaluations) on state-of-the-art deep learning models, classic supervised methods, and real-world systems. Almost all models showed significant performance degradation, including a decline of more than 50% of BERT's prediction accuracy on tasks such as aspect-level sentiment classification, named entity recognition, and natural language inference. Therefore, we call for the robustness to be included in the model evaluation, so as to promote the healthy development of NLP technology.


My Process for Learning Natural Language Processing with Deep Learning

#artificialintelligence

I currently work as a Data Scientist for Informatica and I thought I'd share my process for learning new things. Recently I've been wanting to explore more into Deep Learning, especially Machine Vision and Natural Language Processing. I've been procrastinating a lot, mostly because it's been summer, but now that it's fall and starting to cool down and get dark early, I'm going to be spending more time learning when it's dark out. And the thing that deeply interests me is Deep Learning and Artificial Intelligence, partly out of intellectual curiosity and partly out of greed, as most businesses and products will incorporate Deep Learning/ML in some way. I started doing research and realized that an understanding and knowledge of Deep Learning was within my reach, but I also realized that I still have a lot to learn, more than I initially thought.