Text Classification
Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment
Jin, Di, Jin, Zhijing, Zhou, Joey Tianyi, Szolovits, Peter
Machine learning algorithms are often vulnerable to adversarial examples that have imperceptible alterations from the original counterparts but can fool the state-of-the-art models. It is helpful to evaluate or even improve the robustness of these models by exposing the maliciously crafted adversarial examples. In this paper, we present the TextFooler, a general attack framework, to generate natural adversarial texts. By successfully applying it to two fundamental natural language tasks, text classification and textual entailment, against various target models, convolutional and recurrent neural networks as well as the most powerful pre-trained BERT, we demonstrate the advantages of this framework in three ways: (i) effective---it outperforms state-of-the-art attacks in terms of success rate and perturbation rate; (ii) utility-preserving---it preserves semantic content and grammaticality, and remains correctly classified by humans; and (iii) efficient---it generates adversarial text with computational complexity linear in the text length.
BERT to the rescue!
In this post, I want to show how to apply BERT to a simple text classification problem. I assume that you're more or less familiar with what BERT is on a high level, and focus more on the practical side by showing you how to utilize it in your work. Roughly speaking, BERT is a model that knows to represent text. You give it some sequence as an input, it then looks left and right several times and produces a vector representation for each word as the output. In their paper, the authors describe two ways to work with BERT, one as with "feature extraction" mechanism.
BERT to the rescue!
In this post, I want to show how to apply BERT to a simple text classification problem. I assume that you're more or less familiar with what BERT is on a high level, and focus more on the practical side by showing you how to utilize it in your work. Roughly speaking, BERT is a model that knows to represent text. You give it some sequence as an input, it then looks left and right several times and produces a vector representation for each word as the output. In their paper, the authors describe two ways to work with BERT, one as with "feature extraction" mechanism.
A cost-reducing partial labeling estimator in text classification problem
Chen, Jiangning, Dai, Zhibo, Duan, Juntao, Hu, Qianli, Li, Ruilin, Matzinger, Heinrich, Popescu, Ionel, Zhai, Haoyan
We propose a new approach to address the text classification problems when learning with partial labels is beneficial. Instead of offering each training sample a set of candidate labels, we assign negative-oriented labels to the ambiguous training examples if they are unlikely fall into certain classes. We construct our new maximum likelihood estimators with self-correction property, and prove that under some conditions, our estimators converge faster. Also we discuss the advantages of applying one of our estimator to a fully supervised learning problem. The proposed method has potential applicability in many areas, such as crowdsourcing, natural language processing and medical image analysis.
Hierarchical Taxonomy-Aware and Attentional Graph Capsule RCNNs for Large-Scale Multi-Label Text Classification
Peng, Hao, Li, Jianxin, Gong, Qiran, Wang, Senzhang, He, Lifang, Li, Bo, Wang, Lihong, Yu, Philip S.
CNNs, RNNs, GCNs, and CapsNets have shown significant insights in representation learning and are widely used in various text mining tasks such as large-scale multi-label text classification. However, most existing deep models for multi-label text classification consider either the non-consecutive and long-distance semantics or the sequential semantics, but how to consider them both coherently is less studied. In addition, most existing methods treat output labels as independent methods, but ignore the hierarchical relations among them, leading to useful semantic information loss. In this paper, we propose a novel hierarchical taxonomy-aware and attentional graph capsule recurrent CNNs framework for large-scale multi-label text classification. Specifically, we first propose to model each document as a word order preserved graph-of-words and normalize it as a corresponding words-matrix representation which preserves both the non-consecutive, long-distance and local sequential semantics. Then the words-matrix is input to the proposed attentional graph capsule recurrent CNNs for more effectively learning the semantic features. To leverage the hierarchical relations among the class labels, we propose a hierarchical taxonomy embedding method to learn their representations, and define a novel weighted margin loss by incorporating the label representation similarity. Extensive evaluations on three datasets show that our model significantly improves the performance of large-scale multi-label text classification by comparing with state-of-the-art approaches.
Forward and Backward Knowledge Transfer for Sentiment Classification
Wang, Hao, Liu, Bing, Wang, Shuai, Ma, Nianzu, Yang, Yan
This paper studies the problem of learning a sequence of sentiment classification tasks. The learned knowledge from each task is retained and used to help future or subsequent task learning. This learning paradigm is called Lifelong Learning (LL). However, existing LL methods either only transfer knowledge forward to help future learning and do not go back to improve the model of a previous task or require the training data of the previous task to retrain its model to exploit backward/reverse knowledge transfer. This paper studies reverse knowledge transfer of LL in the context of naive Bayesian (NB) classification. It aims to improve the model of a previous task by leveraging future knowledge without retraining using its training data. This is done by exploiting a key characteristic of the generative model of NB. That is, it is possible to improve the NB classifier for a task by improving its model parameters directly by using the retained knowledge from other tasks. Experimental results show that the proposed method markedly outperforms existing LL baselines.
Topic Sensitive Attention on Generic Corpora Corrects Sense Bias in Pretrained Embeddings
Piratla, Vihari, Sarawagi, Sunita, Chakrabarti, Soumen
Given a small corpus $\mathcal D_T$ pertaining to a limited set of focused topics, our goal is to train embeddings that accurately capture the sense of words in the topic in spite of the limited size of $\mathcal D_T$. These embeddings may be used in various tasks involving $\mathcal D_T$. A popular strategy in limited data settings is to adapt pre-trained embeddings $\mathcal E$ trained on a large corpus. To correct for sense drift, fine-tuning, regularization, projection, and pivoting have been proposed recently. Among these, regularization informed by a word's corpus frequency performed well, but we improve upon it using a new regularizer based on the stability of its cooccurrence with other words. However, a thorough comparison across ten topics, spanning three tasks, with standardized settings of hyper-parameters, reveals that even the best embedding adaptation strategies provide small gains beyond well-tuned baselines, which many earlier comparisons ignored. In a bold departure from adapting pretrained embeddings, we propose using $\mathcal D_T$ to probe, attend to, and borrow fragments from any large, topic-rich source corpus (such as Wikipedia), which need not be the corpus used to pretrain embeddings. This step is made scalable and practical by suitable indexing. We reach the surprising conclusion that even limited corpus augmentation is more useful than adapting embeddings, which suggests that non-dominant sense information may be irrevocably obliterated from pretrained embeddings and cannot be salvaged by adaptation.
Semi-Unsupervised Lifelong Learning for Sentiment Classification: Less Manual Data Annotation and More Self-Studying
Hong, Xianbin, Pal, Gautam, Guan, Sheng-Uei, Wong, Prudence, Liu, Dawei, Man, Ka Lok, Huang, Xin
Lifelong machine learning is a novel machine learning paradigm which can continually accumulate knowledge during learning. The knowledge extracting and reusing abilities enable the lifelong machine learning to solve the related problems. The traditional approaches like Na\"ive Bayes and some neural network based approaches only aim to achieve the best performance upon a single task. Unlike them, the lifelong machine learning in this paper focuses on how to accumulate knowledge during learning and leverage them for further tasks. Meanwhile, the demand for labelled data for training also is significantly decreased with the knowledge reusing. This paper suggests that the aim of the lifelong learning is to use less labelled data and computational cost to achieve the performance as well as or even better than the supervised learning.
The 50 Best Free Datasets for Machine Learning Lionbridge AI
This article is also available in Japanese and Simplified Chinese. Lionbridge AI has assembled a wealth of resources for machine learning and natural language processing activities. In our previous articles, we explained why datasets are such an integral part of machine learning and natural language processing. Without training datasets, machine-learning algorithms would have no way of learning how to do text mining, text classification, or categorize products. This article is the ultimate list of open datasets for machine learning.
Unsupervised automatic classification of Scanning Electron Microscopy (SEM) images of CD4+ cells with varying extent of HIV virion infection
Wandeto, John M., Dresp-Langley, Birgitta
Archiving large sets of medical or cell images in digital libraries may require ordering randomly scattered sets of image data according to specific criteria, such as the spatial extent of a specific local color or contrast content that reveals different meaningful states of a physiological structure, tissue, or cell in a certain order, indicating progression or recession of a pathology, or the progressive response of a cell structure to treatment. Here we used a Self Organized Map (SOM)-based, fully automatic and unsupervised, classification procedure described in our earlier work and applied it to sets of minimally processed grayscale and/or color processed Scanning Electron Microscopy (SEM) images of CD4+ T-lymphocytes (so-called helper cells) with varying extent of HIV virion infection. It is shown that the quantization error in the SOM output after training permits to scale the spatial magnitude and the direction of change (+ or -) in local pixel contrast or color across images of a series with a reliability that exceeds that of any human expert. The procedure is easily implemented and fast, and represents a promising step towards low-cost automatic digital image archiving with minimal intervention of a human operator.