"A text classifier is an automated means of determining some metadata about a document. Text classifiers are used for such diverse needs as spam filtering, suggesting categories for indexing a document created in a content management system, or automatically sorting help desk requests."
– John Graham-Cumming, Naive Bayesian Text Classification. Dr. Dobb's. May 1 2005.
I assume that you're more or less familiar with what BERT is on a high level, and focus more on the practical side by showing you how to utilize it in your work. Roughly speaking, BERT is a model that knows to represent text. You give it some sequence as an input, it then looks left and right several times and produces a vector representation for each word as the output. In their paper, the authors describe two ways to work with BERT, one as with "feature extraction" mechanism. That is, we use the final output of BERT as an input to another model.
In this post, I want to show how to apply BERT to a simple text classification problem. I assume that you're more or less familiar with what BERT is on a high level, and focus more on the practical side by showing you how to utilize it in your work. Roughly speaking, BERT is a model that knows to represent text. You give it some sequence as an input, it then looks left and right several times and produces a vector representation for each word as the output. In their paper, the authors describe two ways to work with BERT, one as with "feature extraction" mechanism.
We propose a new approach to address the text classification problems when learning with partial labels is beneficial. Instead of offering each training sample a set of candidate labels, we assign negative-oriented labels to the ambiguous training examples if they are unlikely fall into certain classes. We construct our new maximum likelihood estimators with self-correction property, and prove that under some conditions, our estimators converge faster. Also we discuss the advantages of applying one of our estimator to a fully supervised learning problem. The proposed method has potential applicability in many areas, such as crowdsourcing, natural language processing and medical image analysis.
This paper studies the problem of learning a sequence of sentiment classification tasks. The learned knowledge from each task is retained and used to help future or subsequent task learning. This learning paradigm is called Lifelong Learning (LL). However, existing LL methods either only transfer knowledge forward to help future learning and do not go back to improve the model of a previous task or require the training data of the previous task to retrain its model to exploit backward/reverse knowledge transfer. This paper studies reverse knowledge transfer of LL in the context of naive Bayesian (NB) classification. It aims to improve the model of a previous task by leveraging future knowledge without retraining using its training data. This is done by exploiting a key characteristic of the generative model of NB. That is, it is possible to improve the NB classifier for a task by improving its model parameters directly by using the retained knowledge from other tasks. Experimental results show that the proposed method markedly outperforms existing LL baselines.
In this paper, we introduce an approach for leveraging available data across multiple locales sharing the same language to 1) improve domain classification model accuracy in Spoken Language Understanding and user experience even if new locales do not have sufficient data and 2) reduce the cost of scaling the domain classifier to a large number of locales. We propose a locale-agnostic universal domain classification model based on selective multi-task learning that learns a joint representation of an utterance over locales with different sets of domains and allows locales to share knowledge selectively depending on the domains. The experimental results demonstrate the effectiveness of our approach on domain classification task in the scenario of multiple locales with imbalanced data and disparate domain sets. The proposed approach outperforms other baselines models especially when classifying locale-specific domains and also low-resourced domains.
Archiving large sets of medical or cell images in digital libraries may require ordering randomly scattered sets of image data according to specific criteria, such as the spatial extent of a specific local color or contrast content that reveals different meaningful states of a physiological structure, tissue, or cell in a certain order, indicating progression or recession of a pathology, or the progressive response of a cell structure to treatment. Here we used a Self Organized Map (SOM)-based, fully automatic and unsupervised, classification procedure described in our earlier work and applied it to sets of minimally processed grayscale and/or color processed Scanning Electron Microscopy (SEM) images of CD4+ T-lymphocytes (so-called helper cells) with varying extent of HIV virion infection. It is shown that the quantization error in the SOM output after training permits to scale the spatial magnitude and the direction of change (+ or -) in local pixel contrast or color across images of a series with a reliability that exceeds that of any human expert. The procedure is easily implemented and fast, and represents a promising step towards low-cost automatic digital image archiving with minimal intervention of a human operator.
In machine learning and statistics, classification is a supervised learning approach in which the computer program learns from the data input given to it and then uses this learning to classify new observation. This data set may simply be bi-class (like identifying whether the person is male or female or that the mail is spam or non-spam) or it may be multi-class too. Some examples of classification problems are: speech recognition, handwriting recognition, bio metric identification, document classification etc. It is a classification technique based on Bayes' Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.
Online communication platforms are increasingly overwhelmed by rude or offensive comments, which can make people give up on discussion altogether. In response to this issue, the Jigsaw team and Google's Counter Abuse technology team collaborated with sites that have shared their comments and moderation decisions to create the Perspective API. Perspective helps online communities host better conversations by, for example, enabling moderators to more quickly identify which comments might violate their community guidelines. Several publishers have also worked on systems that provide feedback to users before they publish comments (e.g. the Coral Talk Plugin). Showing a pre-submit message that a comment might violate community guidelines, or that it will be held for review before publication, has already proven to be an effective tool to encourage users to think more carefully about how their comments might be received.
This report examines the Pinned AUC metric introduced in  and highlights some of its limitations. Pinned AUC provides a threshold-agnostic measure of unintended bias in a classification model, inspired by the ROC-AUC metric. However, as we highlight in this report, there are ways that the metric can obscure different kinds of unintended biases when the underlying class distributions on which bias is being measured are not carefully controlled. In , Pinned AUC is applied to a synthetically generated test set where all identity subgroups have identical representation of the classification labels. This method of controlling the class distributions avoids Pinned AUC's potential to obscure unintended biases. However, if the test data contains different distributions of classification labels between identities, Pinned AUC's measurement of bias can be skewed, either over or under representing the extent of unintended bias. In this report, the reasons for Pinned AUC's lack of robustness to variations in the class distributions are demonstrated. We also illustrate how unintended bias identified by Pinned AUC can be decomposed into the metrics presented in . To avoid requiring careful class balancing, which is hard to do on real data, instead of using Pinned AUC, the threshold agnostic metrics presented in  can be used; these are robust to variations in the class distributions and provide a more nuanced view of unintended bias.
The past year has ushered in an exciting age for Natural Language Processing using deep neural networks. Research in the field of using pre-trained models have resulted in massive leap in state-of-the-art results for many of the NLP tasks, such as text classification, natural language inference and question-answering. Some of the key milestones have been ELMo, ULMFiT and OpenAI Transformer. All these approaches allow us to pre-train an unsupervised language model on large corpus of data such as all wikipedia articles, and then fine-tune these pre-trained models on downstream tasks. Perhaps the most exciting event of the year in this area has been the release of BERT, a multilingual transformer based model that has achieved state-of-the-art results on various NLP tasks.