Goto

Collaborating Authors

 Text Classification


Azimuth: Systematic Error Analysis for Text Classification

arXiv.org Artificial Intelligence

We present Azimuth, an open-source and easy-to-use tool to perform error analysis for text classification. Compared to other stages of the ML development cycle, such as model training and hyper-parameter tuning, the process and tooling for the error analysis stage are less mature. However, this stage is critical for the development of reliable and trustworthy AI systems. To make error analysis more systematic, we propose an approach comprising dataset analysis and model quality assessment, which Azimuth facilitates. We aim to help AI practitioners discover and address areas where the model does not generalize by leveraging and integrating a range of ML techniques, such as saliency maps, similarity, uncertainty, and behavioral analyses, all in one tool. Our code and documentation are available at github.com/servicenow/azimuth.


Utilizing distilBert transformer model for sentiment classification of COVID-19's Persian open-text responses

arXiv.org Artificial Intelligence

The COVID-19 pandemic has caused drastic alternations in human's life in all aspects. The government's laws in this regard affected the lifestyle of all people. Due to this fact studying about the sentiment of individuals is important to be aware of the future impacts of the coming pandemics. To contribute to this aim, we proposed a NLP (Natural Language Processing) model to analyze open-text answers in a survey in Persian and detect positive and negative feelings of the people in Iran. In this study, a distilBert transformer model was applied to take on this task. We deployed three approaches to perform comparison, and our best model could gain accuracy: 0.824, Precision: 0.824, Recall: 0.798 and F1score: 0.804.


Improve Text Classification Accuracy with Intent Information

arXiv.org Artificial Intelligence

In addition, existing text classification approaches only consider the utterances in the coarse granularity level, which In recent years, goal-oriented dialogue systems have been may less the possibility of the model to explore relationship widely applied in intelligent voice assistant, e.g., Apple Siri, between token-level information and label information. For Amazon Alexa, where intent classification technology plays a example, in Figure 1, if there is a token "When" in the crucial part. Given input utterance in natural language, the intent input sentence, then the "NUM" intent is more likely to be classification module aims to detect the user's intent [10], recognized as high correlation with it, while the "LOC" intent [11], [14], [23]. Previous works have been proposed for better does not.


FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification

arXiv.org Artificial Intelligence

Weakly-supervised text classification aims to train a classifier using only class descriptions and unlabeled data. Recent research shows that keyword-driven methods can achieve state-of-the-art performance on various tasks. However, these methods not only rely on carefully-crafted class descriptions to obtain class-specific keywords but also require substantial amount of unlabeled data and takes a long time to train. This paper proposes FastClass, an efficient weakly-supervised classification approach. It uses dense text representation to retrieve class-relevant documents from external unlabeled corpus and selects an optimal subset to train a classifier. Compared to keyword-driven methods, our approach is less reliant on initial class descriptions as it no longer needs to expand each class description into a set of class-specific keywords. Experiments on a wide range of classification tasks show that the proposed approach frequently outperforms keyword-driven models in terms of classification accuracy and often enjoys orders-of-magnitude faster training speed.


Structured Prompting: Scaling In-Context Learning to 1,000 Examples

arXiv.org Artificial Intelligence

Large language models have exhibited intriguing in-context learning capability, achieving promising zero- and few-shot performance without updating the parameters. However, conventional in-context learning is usually restricted by length constraints, rendering it ineffective to absorb supervision from a large number of examples. In order to go beyond few shots, we introduce structured prompting that breaks the length limit and scales in-context learning to thousands of examples. Specifically, demonstration examples are separately encoded with well-designed position embeddings, and then they are jointly attended by the test example using a rescaled attention mechanism. So we can scale the number of exemplars with linear complexity instead of quadratic complexity with respect to length. Experimental results on a diverse set of tasks show that our approach improves end-task performance and reduces evaluation variance over conventional in-context learning as the number of demonstration examples increases. Code has been released at https://aka.ms/structured-prompting.


Moto: Enhancing Embedding with Multiple Joint Factors for Chinese Text Classification

arXiv.org Artificial Intelligence

Recently, language representation techniques have achieved great performances in text classification. However, most existing representation models are specifically designed for English materials, which may fail in Chinese because of the huge difference between these two languages. Actually, few existing methods for Chinese text classification process texts at a single level. However, as a special kind of hieroglyphics, radicals of Chinese characters are good semantic carriers. In addition, Pinyin codes carry the semantic of tones, and Wubi reflects the stroke structure information, \textit{etc}. Unfortunately, previous researches neglected to find an effective way to distill the useful parts of these four factors and to fuse them. In our works, we propose a novel model called Moto: Enhancing Embedding with \textbf{M}ultiple J\textbf{o}int Fac\textbf{to}rs. Specifically, we design an attention mechanism to distill the useful parts by fusing the four-level information above more effectively. We conduct extensive experiments on four popular tasks. The empirical results show that our Moto achieves SOTA 0.8316 ($F_1$-score, 2.11\% improvement) on Chinese news titles, 96.38 (1.24\% improvement) on Fudan Corpus and 0.9633 (3.26\% improvement) on THUCNews.


G-MAP: General Memory-Augmented Pre-trained Language Model for Domain Tasks

arXiv.org Artificial Intelligence

Recently, domain-specific PLMs have been proposed to boost the task performance of specific domains (e.g., biomedical and computer science) by continuing to pre-train general PLMs with domain-specific corpora. However, this Domain-Adaptive Pre-Training (DAPT; Gururangan et al. (2020)) tends to forget the previous general knowledge acquired by general PLMs, which leads to a catastrophic forgetting phenomenon and sub-optimal performance. To alleviate this problem, we propose a new framework of General Memory Augmented Pre-trained Language Model (G-MAP), which augments the domain-specific PLM by a memory representation built from the frozen general PLM without losing any general knowledge. Specifically, we propose a new memory-augmented layer, and based on it, different augmented strategies are explored to build the memory representation and then adaptively fuse it into the domain-specific PLM. We demonstrate the effectiveness of G-MAP on various domains (biomedical and computer science publications, news, and reviews) and different kinds (text classification, QA, NER) of tasks, and the extensive results show that the proposed G-MAP can achieve SOTA results on all tasks.


Learning Label Modular Prompts for Text Classification in the Wild

arXiv.org Artificial Intelligence

Machine learning models usually assume i.i.d data during training and testing, but data and tasks in real world often change over time. To emulate the transient nature of real world, we propose a challenging but practical task: text classification in-the-wild, which introduces different non-stationary training/testing stages. Decomposing a complex task into modular components can enable robust generalisation under such non-stationary environment. However, current modular approaches in NLP do not take advantage of recent advances in parameter efficient tuning of pretrained language models. To close this gap, we propose MODULARPROMPT, a label-modular prompt tuning framework for text classification tasks. In MODULARPROMPT, the input prompt consists of a sequence of soft label prompts, each encoding modular knowledge related to the corresponding class label. In two of most formidable settings, MODULARPROMPT outperforms relevant baselines by a large margin demonstrating strong generalisation ability. We also conduct comprehensive analysis to validate whether the learned prompts satisfy properties of a modular representation.


Meta Learning for Few-Shot Medical Text Classification

arXiv.org Artificial Intelligence

Medical professionals frequently work in a data constrained setting to provide insights across a unique demographic. A few medical observations, for instance, informs the diagnosis and treatment of a patient. This suggests a unique setting for meta-learning, a method to learn models quickly on new tasks, to provide insights unattainable by other methods. We investigate the use of meta-learning and robustness techniques on a broad corpus of benchmark text and medical data. To do this, we developed new data pipelines, combined language models with meta-learning approaches, and extended existing meta-learning algorithms to minimize worst case loss. We find that meta-learning on text is a suitable framework for text-based data, providing better data efficiency and comparable performance to few-shot language models and can be successfully applied to medical note data. Furthermore, meta-learning models coupled with DRO can improve worst case loss across disease codes.


Generalised Spherical Text Embedding

arXiv.org Artificial Intelligence

This paper aims to provide an unsupervised modelling approach that allows for a more flexible representation of text embeddings. It jointly encodes the words and the paragraphs as individual matrices of arbitrary column dimension with unit Frobenius norm. The representation is also linguistically motivated with the introduction of a novel similarity metric. The proposed modelling and the novel similarity metric exploits the matrix structure of embeddings. We then go on to show that the same matrices can be reshaped into vectors of unit norm and transform our problem into an optimization problem over the spherical manifold. We exploit manifold optimization to efficiently train the matrix embeddings. We also quantitatively verify the quality of our text embeddings by showing that they demonstrate improved results in document classification, document clustering, and semantic textual similarity benchmark tests.