Collaborating Authors

MapRE: An Effective Semantic Mapping Approach for Low-resource Relation Extraction Artificial Intelligence

Neural relation extraction models have shown promising results in recent years; however, the model performance drops dramatically given only a few training samples. Recent works try leveraging the advance in few-shot learning to solve the low resource problem, where they train label-agnostic models to directly compare the semantic similarities among context sentences in the embedding space. However, the label-aware information, i.e., the relation label that contains the semantic knowledge of the relation itself, is often neglected for prediction. In this work, we propose a framework considering both label-agnostic and label-aware semantic mapping information for low resource relation extraction. We show that incorporating the above two types of mapping information in both pretraining and fine-tuning can significantly improve the model performance on low-resource relation extraction tasks.

Interpretable Charge Prediction for Criminal Cases with Dynamic Rationale Attention

Journal of Artificial Intelligence Research

Charge prediction which aims to determine appropriate charges for criminal cases based on textual fact descriptions, is an important technology in the field of AI&Law. Previous works focus on improving prediction accuracy, ignoring the interpretability, which limits the methods' applicability. In this work, we propose a deep neural framework to extract short but charge-decisive text snippets - rationales - from input fact description, as the interpretation of charge prediction. To solve the scarcity problem of rationale annotated corpus, rationales are extracted in a reinforcement style with the only supervision in the form of charge labels. We further propose a dynamic rationale attention mechanism to better utilize the information in extracted rationales and predict the charges. Experimental results show that besides providing charge prediction interpretation, our approach can also capture subtle details to help charge prediction.

Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese Grammatical Error Correction Artificial Intelligence

We investigate the problem of Chinese Grammatical Error Correction (CGEC) and present a new framework named Tail-to-Tail (\textbf{TtT}) non-autoregressive sequence prediction to address the deep issues hidden in CGEC. Considering that most tokens are correct and can be conveyed directly from source to target, and the error positions can be estimated and corrected based on the bidirectional context information, thus we employ a BERT-initialized Transformer Encoder as the backbone model to conduct information modeling and conveying. Considering that only relying on the same position substitution cannot handle the variable-length correction cases, various operations such substitution, deletion, insertion, and local paraphrasing are required jointly. Therefore, a Conditional Random Fields (CRF) layer is stacked on the up tail to conduct non-autoregressive sequence prediction by modeling the token dependencies. Since most tokens are correct and easily to be predicted/conveyed to the target, then the models may suffer from a severe class imbalance issue. To alleviate this problem, focal loss penalty strategies are integrated into the loss functions. Moreover, besides the typical fix-length error correction datasets, we also construct a variable-length corpus to conduct experiments. Experimental results on standard datasets, especially on the variable-length datasets, demonstrate the effectiveness of TtT in terms of sentence-level Accuracy, Precision, Recall, and F1-Measure on tasks of error Detection and Correction.

A Quick Dive into Deep Learning: From Neural Cells to BERT


Get unbeatable offers with up to 90% off on cloud servers and up to $300 rebate for all products! Click here to learn more. As a milestone in the natural language processing field, Bidirectional Encoder Representations from Transformers (BERT) did not appear out of nowhere. Rather, the development of this complex model followed a long line of development for deep learning and neural network models. In this article, written by Shi En, Feng Yin, and Tiao Can, from the dialog algorithm team at Ant Financial, we will look at the evolution of some of the major deep learning models-from the very simplest to the most complex-that we have come to know and use nowadays. That is, from a simple neural cell to one of the most complex model used today-the Bidirectional Encoder Representations from transformers (BERT) model-this article aims to discuss the ways in which deep learning in the area of natural language processing has evolved and developed as well as discuss the future direction of natural language processing based on the industry trends.

Salience-Aware Event Chain Modeling for Narrative Understanding Artificial Intelligence

Storytelling, whether via fables, news reports, documentaries, or memoirs, can be thought of as the communication of interesting and related events that, taken together, form a concrete process. It is desirable to extract the event chains that represent such processes. However, this extraction remains a challenging problem. We posit that this is due to the nature of the texts from which chains are discovered. Natural language text interleaves a narrative of concrete, salient events with background information, contextualization, opinion, and other elements that are important for a variety of necessary discourse and pragmatics acts but are not part of the principal chain of events being communicated. We introduce methods for extracting this principal chain from natural language text, by filtering away non-salient events and supportive sentences. We demonstrate the effectiveness of our methods at isolating critical event chains by comparing their effect on downstream tasks. We show that by pre-training large language models on our extracted chains, we obtain improvements in two tasks that benefit from a clear understanding of event chains: narrative prediction and event-based temporal question answering. The demonstrated improvements and ablative studies confirm that our extraction method isolates critical event chains.