Text Classification
BagBERT: BERT-based bagging-stacking for multi-topic classification
Rakotoson, Loïc, Letaillieur, Charles, Massip, Sylvain, Laleye, Fréjus
This paper describes our submission on the COVID-19 literature annotation task at Biocreative VII. We proposed an approach that exploits the knowledge of the globally non-optimal weights, usually rejected, to build a rich representation of each label. Our proposed approach consists of two stages: (1) A bagging of various initializations of the training data that features weakly trained weights, (2) A stacking of heterogeneous vocabulary models based on BERT and RoBERTa Embeddings. The aggregation of these weak insights performs better than a classical globally efficient model. The purpose is the distillation of the richness of knowledge to a simpler and lighter model. Our system obtains an Instance-based F1 of 92.96 and a Label-based micro-F1 of 91.35.
TaskDrop: A Competitive Baseline for Continual Learning of Sentiment Classification
Mei, Jianping, Zheng, Yilun, Zhou, Qianwei, Yan, Rui
In this paper, we study the multi-task sentiment classification problem in the continual learning setting, i.e., a model is sequentially trained to classifier the sentiment of reviews of products in a particular category. The use of common sentiment words in reviews of different product categories leads to large cross-task similarity, which differentiates it from continual learning in other domains. This knowledge sharing nature renders forgetting reduction focused approaches less effective for the problem under consideration. Unlike existing approaches, where task-specific masks are learned with specifically presumed training objectives, we propose an approach called Task-aware Dropout (TaskDrop) to generate masks in a random way. While the standard dropout generates and applies random masks for each training instance per epoch for effective regularization, TaskDrop applies random masking for task-wise capacity allocation and reuse. We conducted experimental studies on three multi-task review datasets and made comparison to various baselines and state-of-the-art approaches. Our empirical results show that regardless of simplicity, TaskDrop overall achieved competitive performances for all the three datasets, especially after relative long term learning. This demonstrates that the proposed random capacity allocation mechanism works well for continual sentiment classification.
OpenPrompt: An Open-source Framework for Prompt-learning
Ding, Ning, Hu, Shengding, Zhao, Weilin, Chen, Yulin, Liu, Zhiyuan, Zheng, Hai-Tao, Sun, Maosong
Prompt-learning has become a new paradigm in modern natural language processing, which directly adapts pre-trained language models (PLMs) to $cloze$-style prediction, autoregressive modeling, or sequence to sequence generation, resulting in promising performances on various tasks. However, no standard implementation framework of prompt-learning is proposed yet, and most existing prompt-learning codebases, often unregulated, only provide limited implementations for specific scenarios. Since there are many details such as templating strategy, initializing strategy, and verbalizing strategy, etc. need to be considered in prompt-learning, practitioners face impediments to quickly adapting the desired prompt learning methods to their applications. In this paper, we present {OpenPrompt}, a unified easy-to-use toolkit to conduct prompt-learning over PLMs. OpenPrompt is a research-friendly framework that is equipped with efficiency, modularity, and extendibility, and its combinability allows the freedom to combine different PLMs, task formats, and prompting modules in a unified paradigm. Users could expediently deploy prompt-learning frameworks and evaluate the generalization of them on different NLP tasks without constraints. OpenPrompt is publicly released at {\url{ https://github.com/thunlp/OpenPrompt}}.
Conical Classification For Computationally Efficient One-Class Topic Determination
As the Internet grows in size, so does the amount of text based information that exists. For many application spaces it is paramount to isolate and identify texts that relate to a particular topic. While one-class classification would be ideal for such analysis, there is a relative lack of research regarding efficient approaches with high predictive power. By noting that the range of documents we wish to identify can be represented as positive linear combinations of the Vector Space Model representing our text, we propose Conical classification, an approach that allows us to identify if a document is of a particular topic in a computationally efficient manner. We also propose Normal Exclusion, a modified version of Bi-Normal Separation that makes it more suitable within the one-class classification context. We show in our analysis that our approach not only has higher predictive power on our datasets, but is also faster to compute.
Hierarchical Heterogeneous Graph Representation Learning for Short Text Classification
Wang, Yaqing, Wang, Song, Yao, Quanming, Dou, Dejing
Short text classification is a fundamental task in natural language processing. It is hard due to the lack of context information and labeled data in practice. In this paper, we propose a new method called SHINE, which is based on graph neural network (GNN), for short text classification. First, we model the short text dataset as a hierarchical heterogeneous graph consisting of word-level component graphs which introduce more semantic and syntactic information. Then, we dynamically learn a short document graph that facilitates effective label propagation among similar short texts. Thus, compared with existing GNN-based methods, SHINE can better exploit interactions between nodes of the same types and capture similarities between short texts. Extensive experiments on various benchmark short text datasets show that SHINE consistently outperforms state-of-the-art methods, especially with fewer labels.
Towards Math-Aware Automated Classification and Similarity Search of Scientific Publications: Methods of Mathematical Content Representations
In this paper, we investigate mathematical content representations suitable for the automated classification of and the similarity search in STEM documents using standard machine learning algorithms: the Latent Dirichlet Allocation (LDA) and the Latent Semantic Indexing (LSI). The methods are evaluated on a subset of arXiv.org papers with the Mathematics Subject Classification (MSC) as a reference classification and using the standard precision/recall/F1-measure metrics. The results give insight into how different math representations may influence the performance of the classification and similarity search tasks in STEM repositories. Non-surprisingly, machine learning methods are able to grab distributional semantics from textual tokens. A proper selection of weighted tokens representing math may improve the quality of the results slightly. A structured math representation that imitates successful text-processing techniques with math is shown to yield better results than flat TeX tokens.
TENT: Text Classification Based on ENcoding Tree Learning
Zhang, Chong, Wu, Junran, Zhu, He, Xu, Ke
Text classification is a primary task in natural language processing (NLP). Recently, graph neural networks (GNNs) have developed rapidly and been applied to text classification tasks. Although more complex models tend to achieve better performance, research highly depends on the computing power of the device used. In this article, we propose TENT (https://github.com/Daisean/TENT) to obtain better text classification performance and reduce the reliance on computing power. Specifically, we first establish a dependency analysis graph for each text and then convert each graph into its corresponding encoding tree. The representation of the entire graph is obtained by updating the representation of the non-leaf nodes in the encoding tree. Experimental results show that our method outperforms other baselines on several datasets while having a simple structure and few parameters.
Fine-Tuning BERT for text-classification in Pytorch
BERT is a state-of-the-art model by Google that came in 2019. In this blog, I will go step by step to finetune the BERT model for movie reviews classification(i.e positive or negative). Here, I will be using the Pytorch framework for the coding perspective. BERT is built on top of the transformer (explained in paper Attention is all you Need). Input text sentences would first be tokenized into words, then the special tokens ( [CLS], [SEP], ##token) will be added to the sequence of words.
Multimodal Classification: Current Landscape, Taxonomy and Future Directions
Sleeman, William C. IV, Kapoor, Rishabh, Ghosh, Preetam
Multimodal classification research has been gaining popularity in many domains that collect more data from multiple sources including satellite imagery, biometrics, and medicine. However, the lack of consistent terminology and architectural descriptions makes it difficult to compare different existing solutions. We address these challenges by proposing a new taxonomy for describing such systems based on trends found in recent publications on multimodal classification. Many of the most difficult aspects of unimodal classification have not yet been fully addressed for multimodal datasets including big data, class imbalance, and instance level difficulty. We also provide a discussion of these challenges and future directions.
Hierarchy-Aware T5 with Path-Adaptive Mask Mechanism for Hierarchical Text Classification
Huang, Wei, Liu, Chen, Zhao, Yihua, Yang, Xinyun, Pan, Zhaoming, Zhang, Zhimin, Liu, Guiquan
Hierarchical Text Classification (HTC), which aims to predict text labels organized in hierarchical space, is a significant task lacking in investigation in natural language processing. Existing methods usually encode the entire hierarchical structure and fail to construct a robust label-dependent model, making it hard to make accurate predictions on sparse lower-level labels and achieving low Macro-F1. In this paper, we propose a novel PAMM-HiA-T5 model for HTC: a hierarchy-aware T5 model with path-adaptive mask mechanism that not only builds the knowledge of upper-level labels into low-level ones but also introduces path dependency information in label prediction. Specifically, we generate a multi-level sequential label structure to exploit hierarchical dependency across different levels with Breadth-First Search (BFS) and T5 model. To further improve label dependency prediction within each path, we then propose an original path-adaptive mask mechanism (PAMM) to identify the label's path information, eliminating sources of noises from other paths. Comprehensive experiments on three benchmark datasets show that our novel PAMM-HiA-T5 model greatly outperforms all state-of-the-art HTC approaches especially in Macro-F1. The ablation studies show that the improvements mainly come from our innovative approach instead of T5.