I personally do believe all the fancy ML research and advanced AI algorithm works have very minimal value if not zero until the date when they can be applied to real-life projects without asking the users for an insane amount of resources and excessive domain knowledge. And Hugging Face builds the bridge. Hugging Face is the home for thousands of pre-trained models which have made great contributions to democratizing artificial intelligence through open source and open science. Today, I want to give you an end-to-end code demo to compare two of the most popular pre-trained models by conducting a multi-label text classification analysis. The first model is SentenceTransformers (SBERT).
This notebook classifies movie reviews as positive or negative using the text of the review. This is an example of binary--or two-class--classification, an important and widely applicable kind of machine learning problem. We'll use the IMDB dataset that contains the text of 50,000 movie reviews from the Internet Movie Database. These are split into 25,000 reviews for training and 25,000 reviews for testing. The training and testing sets are balanced, meaning they contain an equal number of positive and negative reviews.
On the Internet, there are a lot of sources that provide enormous amounts of daily news. Further, the demand for information by users has been growing continuously, so it is important to classify the news in a way that lets users access the information they are interested in quickly and efficiently. Using this model, users would be able to identify news topics that go untracked, and/or make recommendations based on their prior interests. Thus, we aim to build models that take news headlines and short descriptions as inputs and produce news categories as outputs. The problem we will tackle is the classification of BBC News articles and their categories.
Text mining is a popular topic for exploring what text you have in documents etc. Text mining and NLP can help you discover different patterns in the text like uncovering certain words or phases which are commonly used, to identifying certain patterns and linkages between different texts/documents. Combining this work on Text mining you can use Word Clouds, time-series analysis, etc to discover other aspects and patterns in the text. Check out my previous blog posts (post 1, post 2) on performing Text Mining on documents (manifestos from some of the political parties from the last two national government elections in Ireland). These two posts gives you a simple indication of what is possible.
BERT (Bi-Directional Encoder Representation from Transformers) is that type of transformer introduced by Google which consists of only encoder and no decoder. Finally after following a similar approach on test data we perform our test evaluation using Mathew's correlation Coefficient which is highly recommended as a metric for classification type of problems. Voila!!! we finally fine tuned our bert model as per our use-case. Complete implementation can be found here…..
Though we're living through a time of extraordinary innovation in GPU-accelerated machine learning, the latest research papers frequently (and prominently) feature algorithms that are decades, in certain cases 70 years old. Some might contend that many of these older methods fall into the camp of'statistical analysis' rather than machine learning, and prefer to date the advent of the sector back only so far as 1957, with the invention of the Perceptron. Given the extent to which these older algorithms support and are enmeshed in the latest trends and headline-grabbing developments in machine learning, it's a contestable stance. So let's take a look at some of the'classic' building blocks underpinning the latest innovations, as well as some newer entries that are making an early bid for the AI hall of fame. In 2017 Google Research led a research collaboration culminating in the paper Attention Is All You Need.
The counterfactual token generation has been limited to perturbing only a single token in texts that are generally short and single sentences. These tokens are often associated with one of many sensitive attributes. With limited counterfactuals generated, the goal to achieve invariant nature for machine learning classification models towards any sensitive attribute gets bounded, and the formulation of Counterfactual Fairness gets narrowed. In this paper, we overcome these limitations by solving root problems and opening bigger domains for understanding. We have curated a resource of sensitive tokens and their corresponding perturbation tokens, even extending the support beyond traditionally used sensitive attributes like Age, Gender, Race to Nationality, Disability, and Religion. The concept of Counterfactual Generation has been extended to multi-token support valid over all forms of texts and documents. We define the method of generating counterfactuals by perturbing multiple sensitive tokens as Counterfactual Multi-token Generation. The method has been conceptualized to showcase significant performance improvement over single-token methods and validated over multiple benchmark datasets. The emendation in counterfactual generation propagates in achieving improved Counterfactual Multi-token Fairness.
Large pretrained language models (LMs) like BERT have improved performance in many disparate natural language processing (NLP) tasks. However, fine tuning such models requires a large number of training examples for each target task. Simultaneously, many realistic NLP problems are "few shot", without a sufficiently large training set. In this work, we propose a novel conditional neural process-based approach for few-shot text classification that learns to transfer from other diverse tasks with rich annotation. Our key idea is to represent each task using gradient information from a base model and to train an adaptation network that modulates a text classifier conditioned on the task representation. While previous task-aware few-shot learners represent tasks by input encoding, our novel task representation is more powerful, as the gradient captures input-output relationships of a task. Experimental results show that our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches on a collection of diverse few-shot tasks. We further conducted analysis and ablations to justify our design choices.
A range of applications for automatic machine learning need the generation process to be controllable. In this work, we propose a way to control the output via a sequence of simple actions, that are called semantic code classes. Finally, we present a semantic code classification task and discuss methods for solving this problem on the Natural Language to Machine Learning (NL2ML) dataset.
Contrastive learning has achieved remarkable success in representation learning via self-supervision in unsupervised settings. However, effectively adapting contrastive learning to supervised learning tasks remains as a challenge in practice. In this work, we introduce a dual contrastive learning (DualCL) framework that simultaneously learns the features of input samples and the parameters of classifiers in the same space. Specifically, DualCL regards the parameters of the classifiers as augmented samples associating to different labels and then exploits the contrastive learning between the input samples and the augmented samples. Empirical studies on five benchmark text classification datasets and their low-resource version demonstrate the improvement in classification accuracy and confirm the capability of learning discriminative representations of DualCL.