Collaborating Authors

Text Classification

Text Classification using Neural Networks


Understanding how chatbots work is important. A fundamental piece of machinery inside a chat-bot is the text classifier. Let's look at the inner workings of an artificial neural network (ANN) for text classification.

Document Classification


Document or text classification is one of the predominant tasks in Natural language processing. It has many applications including news type classification, spam filtering, toxic comment identification, etc. In big organizations the datasets are large and training deep learning text classification models from scratch is a feasible solution but for the majority of real-life problems your dataset is small and if you want to build your machine learning model you need to be smart. In this article (originally posted by Shahul Es on the, I will talk about pragmatic approaches towards text representation which make document classification on small datasets doable.

Weight Poisoning Attacks on Pre-trained Models Machine Learning

Recently, NLP has seen a surge in the usage of large pre-trained models. Users download weights of models pre-trained on large datasets, then fine-tune the weights on a task of their choice. This raises the question of whether downloading untrusted pre-trained weights can pose a security threat. In this paper, we show that it is possible to construct ``weight poisoning'' attacks where pre-trained weights are injected with vulnerabilities that expose ``backdoors'' after fine-tuning, enabling the attacker to manipulate the model prediction simply by injecting an arbitrary keyword. We show that by applying a regularization method, which we call RIPPLe, and an initialization procedure, which we call Embedding Surgery, such attacks are possible even with limited knowledge of the dataset and fine-tuning procedure. Our experiments on sentiment classification, toxicity detection, and spam detection show that this attack is widely applicable and poses a serious threat. Finally, we outline practical defenses against such attacks. Code to reproduce our experiments is available at

VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification Machine Learning

Much progress has been made recently on text classification with methods based on neural networks. In particular, models using attention mechanism such as BERT have shown to have the capability of capturing the contextual information within a sentence or document. However, their ability of capturing the global information about the vocabulary of a language is more limited. This latter is the strength of Graph Convolutional Networks (GCN). In this paper, we propose VGCN-BERT model which combines the capability of BERT with a Vocabulary Graph Convolutional Network (VGCN). Local information and global information interact through different layers of BERT, allowing them to influence mutually and to build together a final representation for classification. In our experiments on several text classification datasets, our approach outperforms BERT and GCN alone, and achieve higher effectiveness than that reported in previous studies.

Deep Learning Based Text Classification: A Comprehensive Review Machine Learning

Deep learning based models have surpassed classical machine learning based approaches in various text classification tasks, including sentiment analysis, news categorization, question answering, and natural language inference. In this work, we provide a detailed review of more than 150 deep learning based models for text classification developed in recent years, and discuss their technical contributions, similarities, and strengths. We also provide a summary of more than 40 popular datasets widely used for text classification. Finally, we provide a quantitative analysis of the performance of different deep learning models on popular benchmarks, and discuss future research directions.

In Automation We Trust: Investigating the Role of Uncertainty in Active Learning Systems Machine Learning

We investigate how different active learning (AL) query policies coupled with classification uncertainty visualizations affect analyst trust in automated classification systems. A current standard policy for AL is to query the oracle (e.g., the analyst) to refine labels for datapoints where the classifier has the highest uncertainty. This is an optimal policy for the automation system as it yields maximal information gain. However, model-centric policies neglect the effects of this uncertainty on the human component of the system and the consequent manner in which the human will interact with the system post-training. In this paper, we present an empirical study evaluating how AL query policies and visualizations lending transparency to classification influence trust in automated classification of image data. We found that query policy significantly influences an analyst's trust in an image classification system, and we use these results to propose a set of oracle query policies and visualizations for use during AL training phases that can influence analyst trust in classification.

Seeing The Whole Patient: Using Multi-Label Medical Text Classification Techniques to Enhance Predictions of Medical Codes Machine Learning

Machine learning-based multi-label medical text classifications can be used to enhance the understanding of the human body and aid the need for patient care. We present a broad study on clinical natural language processing techniques to maximise a feature representing text when predicting medical codes on patients with multi-morbidity. We present results of multi-label medical text classification problems with 18, 50 and 155 labels. We compare several variations to embeddings, text tagging, and pre-processing. For imbalanced data we show that labels which occur infrequently, benefit the most from additional features incorporated in embeddings. We also show that high dimensional embeddings pre-trained using health-related data present a significant improvement in a multi-label setting, similarly to the way they improve performance for binary classification. High dimensional embeddings from this research are made available for public use.

Multi-Label Text Classification using Attention-based Graph Neural Network Machine Learning

In Multi-Label Text Classification (MLTC), one sample can belong to more than one class. It is observed that most MLTC tasks, there are dependencies or correlations among labels. Existing methods tend to ignore the relationship among labels. In this paper, a graph attention network-based model is proposed to capture the attentive dependency structure among the labels. The graph attention network uses a feature matrix and a correlation matrix to capture and explore the crucial dependencies between the labels and generate classifiers for the task. The generated classifiers are applied to sentence feature vectors obtained from the text feature extraction network (BiLSTM) to enable end-to-end training. Attention allows the system to assign different weights to neighbor nodes per label, thus allowing it to learn the dependencies among labels implicitly. The results of the proposed model are validated on five real-world MLTC datasets. The proposed model achieves similar or better performance compared to the previous state-of-the-art models.

6 Pretrained Models to Master Text Classification


Though ERNIE 1.0 (released in March 2019) has been a popular model for text classification, it was ERNIE 2.0 which became the talk of the town in the latter half of 2019. Developed by tech-giant Baidu, ERNIE outperformed Google XLNet and BERT on the GLUE benchmark for English. ERNIE stands for Enhanced Representation through kNowledge IntEgration, and ERNIE 2.0 is an upgraded version of ERNIE 1.0. ERNIE 1.0 was pathbreaking in its own way – it was one of the first models to leverage Knowledge Graphs. This incorporation further enhanced training the model for advanced tasks like Relation Classification and NamedEntityRecognition (NER). Like its predecessor, ERNIE 2.0 brings another innovation to the table in the form of Continual Incremental Multi-task Learning.

AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification

Neural Information Processing Systems

Extreme multi-label text classification (XMTC) is an important problem in the era of {\it big data}, for tagging a given text with the most relevant multiple labels from an extremely large-scale label set. XMTC can be found in many applications, such as item categorization, web page tagging, and news annotation. Traditionally most methods used bag-of-words (BOW) as inputs, ignoring word context as well as deep semantic information. Recent attempts to overcome the problems of BOW by deep learning still suffer from 1) failing to capture the important subtext for each label and 2) lack of scalability against the huge number of labels. We propose a new label tree-based deep learning model for XMTC, called AttentionXML, with two unique features: 1) a multi-label attention mechanism with raw text as input, which allows to capture the most relevant part of text to each label; and 2) a shallow and wide probabilistic label tree (PLT), which allows to handle millions of labels, especially for "tail labels".