"A text classifier is an automated means of determining some metadata about a document. Text classifiers are used for such diverse needs as spam filtering, suggesting categories for indexing a document created in a content management system, or automatically sorting help desk requests."
– John Graham-Cumming, Naive Bayesian Text Classification. Dr. Dobb's. May 1 2005.
I personally do believe all the fancy ML research and advanced AI algorithm works have very minimal value if not zero until the date when they can be applied to real-life projects without asking the users for an insane amount of resources and excessive domain knowledge. And Hugging Face builds the bridge. Hugging Face is the home for thousands of pre-trained models which have made great contributions to democratizing artificial intelligence through open source and open science. Today, I want to give you an end-to-end code demo to compare two of the most popular pre-trained models by conducting a multi-label text classification analysis. The first model is SentenceTransformers (SBERT).
On the Internet, there are a lot of sources that provide enormous amounts of daily news. Further, the demand for information by users has been growing continuously, so it is important to classify the news in a way that lets users access the information they are interested in quickly and efficiently. Using this model, users would be able to identify news topics that go untracked, and/or make recommendations based on their prior interests. Thus, we aim to build models that take news headlines and short descriptions as inputs and produce news categories as outputs. The problem we will tackle is the classification of BBC News articles and their categories.
We will cover all the topics related to solving Multi-Class Text Classification problems with sample implementations in Python / TensorFlow / Keras environment. We will use a Kaggle Dataset in which there are 32 topics and more than 400K total reviews. You can access all the codes, videos, and posts of this tutorial series from the links below. In this tutorial series, there are several parts to cover the Text Classification with various Deep Learning Models topics. You can access all the parts from this index page.
Text classification is the task of assigning a sentence or document an appropriate category. The categories depend on the selected dataset and can cover arbitrary subjects. Therefore, text classifiers can be used to organize, structure, and categorize any kind of text. Common approaches use supervised learning to classify texts. Especially BERT-based language models achieved very good text classification results in recent years.
In this paper, we study the multi-task sentiment classification problem in the continual learning setting, i.e., a model is sequentially trained to classifier the sentiment of reviews of products in a particular category. The use of common sentiment words in reviews of different product categories leads to large cross-task similarity, which differentiates it from continual learning in other domains. This knowledge sharing nature renders forgetting reduction focused approaches less effective for the problem under consideration. Unlike existing approaches, where task-specific masks are learned with specifically presumed training objectives, we propose an approach called Task-aware Dropout (TaskDrop) to generate masks in a random way. While the standard dropout generates and applies random masks for each training instance per epoch for effective regularization, TaskDrop applies random masking for task-wise capacity allocation and reuse. We conducted experimental studies on three multi-task review datasets and made comparison to various baselines and state-of-the-art approaches. Our empirical results show that regardless of simplicity, TaskDrop overall achieved competitive performances for all the three datasets, especially after relative long term learning. This demonstrates that the proposed random capacity allocation mechanism works well for continual sentiment classification.
As the Internet grows in size, so does the amount of text based information that exists. For many application spaces it is paramount to isolate and identify texts that relate to a particular topic. While one-class classification would be ideal for such analysis, there is a relative lack of research regarding efficient approaches with high predictive power. By noting that the range of documents we wish to identify can be represented as positive linear combinations of the Vector Space Model representing our text, we propose Conical classification, an approach that allows us to identify if a document is of a particular topic in a computationally efficient manner. We also propose Normal Exclusion, a modified version of Bi-Normal Separation that makes it more suitable within the one-class classification context. We show in our analysis that our approach not only has higher predictive power on our datasets, but is also faster to compute.
Large pre-trained language models have shown promise for few-shot learning, completing text-based tasks given only a few task-specific examples. Will models soon solve classification tasks that have so far been reserved for human research assistants? Existing benchmarks are not designed to measure progress in applied settings, and so don't directly answer this question. The RAFT benchmark (Real-world Annotated Few-shot Tasks) focuses on naturally occurring tasks and uses an evaluation setup that mirrors deployment. Baseline evaluations on RAFT reveal areas current techniques struggle with: reasoning over long texts and tasks with many classes. Human baselines show that some classification tasks are difficult for non-expert humans, reflecting that real-world value sometimes depends on domain expertise. Yet even non-expert human baseline F1 scores exceed GPT-3 by an average of 0.11. The RAFT datasets and leaderboard will track which model improvements translate into real-world benefits at https://raft.elicit.org .
BERT is a state-of-the-art model by Google that came in 2019. In this blog, I will go step by step to finetune the BERT model for movie reviews classification(i.e positive or negative). Here, I will be using the Pytorch framework for the coding perspective. BERT is built on top of the transformer (explained in paper Attention is all you Need). Input text sentences would first be tokenized into words, then the special tokens ( [CLS], [SEP], ##token) will be added to the sequence of words.
Fine-grained classification involves dealing with datasets with larger number of classes with subtle differences between them. Guiding the model to focus on differentiating dimensions between these commonly confusable classes is key to improving performance on fine-grained tasks. In this work, we analyse the contrastive fine-tuning of pre-trained language models on two fine-grained text classification tasks, emotion classification and sentiment analysis. We adaptively embed class relationships into a contrastive objective function to help differently weigh the positives and negatives, and in particular, weighting closely confusable negatives more than less similar negative examples. We find that Label-aware Contrastive Loss outperforms previous contrastive methods, in the presence of larger number and/or more confusable classes, and helps models to produce output distributions that are more differentiated.